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Chapter 1 

Introduction 



In this thesis I study - both numerically and theoretically - how to do linear algebra 
with disordered sparse matrices that have a spatial structure. Here's what I mean: 

• Spatial Structure: The basis represents a manifold; a volume; a set of D coor- 
dinates living \n a D dimensional world. (I will use the word "system" to refer 
to this volume.) The basis may also include other, non-spatial, coordinates. 

• Sparse: When represented in the position basis, the matrices are nearly diag- 
onal; i.e. the matrix elements {x\M\x + S) are zero if the distance \S\ is not 
small. However, it is important that some matrix elements off the diagonal 
{\S\ ^ 0) be non-zero, in order to describe the system's spatial structure. 
These non-zero off-diagonal elements are often called the kinetic part of the 
matrix. 

• Disordered: The matrix elements vary significantly at spatial scales which 
are small compared to the size of the system. One can define the scale of 

variation A as A"'^ = | In (a;|M|a? + (5) |; being disordered means that A is 
much smaller (a factor of ten or more) than the linear size of the system. 

• Linear Algebra: I study the eigenvectors and eigenvalues of these matrices, as 
well as their Green's function (also known as the resolvent), exponential, 
logarithm, density matrix (step function), Gaussian function, and Cauchy 
distribution. These are all problems from linear algebra. 

• Numerics: Numerical algorithms for calculating these quantities typically run 
in 0{N^) time where N is the basis size and is proportional to the system's 
volume. Therefore even modern computers are limited to small systems, 
for instance cubes of width ^ 20. In this thesis I examine some approximate 
algorithms - called 0{N) algorithms - which allow computation of much larger 
systems. These algorithms rely on certain assumptions about the physics of 
the system, and this thesis goes a long way toward understanding what those 
assumptions are and when they are justified. 

• Theory: I model these disordered matrices with models where some of the 
matrix elements are determined by random numbers. Therefore one has a 
large set of possible disordered matrices, each member of the set distinguished 
by a particular choice of random numbers. I follow the usual theoretical 
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approach of computing averages over this set. Using various techniques, I 
compute averages of multiples of the Green's function, and also estimate 
average eigenfunctions. 

Why are these problems important, and why are they physics? Because any 
physical system with independent degrees of freedom acting at two or more dif- 
ferent length scales can be considered to be disordered. This includes mesoscopic 
physics where electrons flow large distances through an impure and spatially dis- 
torted medium, as well as lattice QCD where neutrons and protons are collective 
excitations of much smaller quarks and gluons. But it also includes oil flow through 
rock, turbulence, protein behavior in cells, heat and light absorption in clouds, and 
most other important problems. Any system where renormalization ideas are appli- 
cable is a disordered system, though actually most renormalization techniques make 
an additional requirement that there be a self-similarity between the various scales. 
In all of these systems the short-scale physics makes a huge qualitative, not just 
quantitative, difference in the large-scale physics. 

It is in precisely these problems that even the fastest computers are just not up 
to the job. Consider a three-dimensional system with just two relevant length scales, 
one ten times the size of the other. To treat both length scales accurately one will 
need to use a grid roughly 100 points on a side, for a total of a million points in 
the volume. Linear algebra tasks that run in = 10®^ = 10^* time are already 
well beyond the capability of modern computers. These 0{N^) tasks can include 
simulating a system's movements, its response to external forces, or its equilibrium 
state. In condensed matter physics simulating movement caused by electronic forces 
requires evaluating the density matrix function (an 0{N^) problem), in lattice QCD 
calculating the ground state requires evaluating the logarithm or the inverse {0{N^) 
again), and in engineering calculating a structure's response to external forces is also 
an 0{N^) calculation. As a consequence, many important problems remain well 
out of reach of numerical study. 

Disorder at short distances may drastically change the physics at large distances. 
In particular, it can cause a disconnection between points that are far apart, so that 
what happens at a given point is independent of everything far away from it. In real 
life people often make this "nearsightedness" assumption, i.e. that what happens 
here is largely independent of things far away, but they rarely have rigorous justi- 
fication. Disorder can give the nearsightedness assumption solid backing. A little 
over ten years ago, some condensed matter physicists began to use this assump- 
tion to justify using certain approximate algorithms to solve a problem which would 
require 0{N^) time if a non-approximate algorithm were used [1,2]. These new 
approximate algorithms are called 0{N) algorithms or, alternatively, linear scaling 
algorithms, because they run in a time which is proportional to the basis size; they 
are much, much faster than 0{N^). 

Almost all 0{N) algorithms were developed specifically for calculating the den- 
sity matrix function, which is important in simulating motion of atoms in condensed 
matter physics. In the context of condensed matter physics it was not clear whether 
the nearsightedness assumption was correct, and it was unclear whether 0{N) al- 
gorithms could be justified. Physicists pursued both theoretical and computational 
strategies to support the 0{N) algorithms. On a theoretical level, they justified the 
nearsightedness assumption not with disorder but instead with the band theory of 
crystals - a bit strange considering that often they were simulating not-crystalline 
materials, and also condering that, in the important case of pure metals, band the- 
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ory does not predict strong localization. On a computational level, physicists used 
three kinds of evidence to support 0{N) approximations: tests that each 0{N) 
algorithm converged to a single, unambiguous result, checks that on a qualitative 
level the simulated atoms did more or less what was expected, and demonstrations 
that much, much larger systems could be simulated. Within these criteria the 0{N) 
algorithms seemed to work well. There have been very few [3,4] more rigorous com- 
putational checks, and these concerned themselves with the accuracy of the Green's 
function, which certain 0{N) algorithms compute as an intermediate step in the 
process of computing the density matrix. 

In my thesis research I did the first quantitative and systematic check of the 
accuracy of 0{N) algorithms for calculating the density matrix. I studied a class of 
0{N) algorithms which I call basis truncation algorithms. Contrary to my previous 
expectations, I found that 0{N) algorithms can be very accurate even in metals 
with a quite small disorder. Examining the results in detail, I found that the algo- 
rithms' accuracy is determined by the coherence length of the eigenfunctions, and 
occurs even when the eigenfunctions are extended throughout the volume of the 
system. To repeat, even when the eigenfunctions are extended 0{N) algorithms 
which ignore any long-distance physics can still provide accurate results. I also found 
a formula which accurately predicts the error. These results have been accepted 
for publication; an expanded version is given here in chapter O This research is 
significant for anybody who wants to simulate a system with several length scales, 
since it widens the class of problems where 0{N) algorithms are applicable, suggests 
that the coherence length is the best way of determining when these algorithms are 
applicable, and gives a tested formula for estimating the error. 

Many of the same 0{N) algorithms used to calculate the density matrix can also 
be applied to computing other matrix functions which are important in many other 
fields, but these opportunities remain largely unexplored. I followed up my density 
matrix research by studying the accuracy of 0{N) algorithms when computing 
the inverse, logarithm, exponential, Cauchy (Lorentz) distribution, and Gaussian 
distribution. I plan to publish these results, which are described in chapter |4j 
They are significant because they again extend the range of applicability of 0{N) 
algorithms, this time to the most important matrix functions in physics and the 
other sciences. 

One of the original motivations for this thesis is the challenge of simulating 
fermions in lattice QCD. I review this problem and its connections with disordered 
systems in chapter[TOl Briefly, modern supercomputers have considerable difficulty 
in doing " unquenched" calculations; i.e. calculations with dynamical quark degrees 
of freedom, or, in other words, with fermion loops. Numerically, the difficulty is 
that one must repeatedly calculate either a determinant (related to the matrix 
logarithm) or a matrix inverse. Both of these problems run in 0{N^) time. My 
goal is to develop new 0{N) algorithms for this problem, and in fact this was the 
motivation for studying the logarithm and the inverse in chapter|5 

While developing theoretical estimates of the error of 0{N) algorithms, I had to 
estimate matrix elements and eigenfunctions in disordered systems. About twenty- 
five years ago. Berry introduced a model of incoherent, unlocalized, nonfractal 
eigenfunctions [5]. I developed a generalization of his model which is able to de- 
scribe both localized eigenfunctions and multifractal eigenfunctions. Multifractal 
eigenfunctions are expected in disordered systems, as was first noted by Castellani 
and Peliti [6]. I expect to submit this phenomenological model for publication soon, 
after revisions to reflect some results published over the past few years which hold 
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in the limit of small disorder. This work is presented in chapter|3 

I have a lot of experience in the software industry, having worked almost five 
years for Microsoft. One of my biggest lessons from this experience is that software 
can never be trusted until both it and its results are systematically and compre- 
hensively checked and rechecked. A few years ago I returned to academic life, and 
was immediately confronted by the fact that the scientific community does not do 
systematic checking of its software and routinely publishes papers containing the re- 
sults of computations which are neither documented nor systematically tested, and 
are neither archived nor reproducible. According to the standards of the software 
industry this sort of conduct is very unprofessional. However this conduct could 
possibly be partially excused by the fact that the scientific community addresses 
problems which are somewhat different than those addressed by the software in- 
dustry. It is important to understand the scientific community and the scientific 
process before coming to conclusions about correct software practices. 

In order to decide what my own software practices should be as a physicist, 
I decided to go through a process of researching and reviewing the physics com- 
munity's awareness and management of software unreliability, in other words, bugs. 
This work culminated in two things: first, realizing that it is very important to make 
one's computations reproducible - i.e. to allow both oneself and other parties to 
easily and reliably reproduce one's own results, if possible simply by installing and 
running the script or program one has written. The second result was a set of rec- 
ommendations about scientific discipline, for both individuals and of organizations. 
I wrote this work up into a rough draft of a paper on the reliability of scientific com- 
putations and wrote to a scientific mailing list where roughly fourty people gladly 
gave comments. A second draft is included here as chapter |51 It should be of 
interest to anyone who is concerned about the reliability of the numerical results 
(numbers, graphs, or formulas derived using software) which they either publish 
themselves or read in published scientific papers. There are very few articles on this 
topic within the whole scientific community, let alone the physics community, and I 
am aware of none that have the comprehensive approach which I have taken, with 
both a review of current practices and a holistic set of recommendations for best 
practices. 

My numerical research into the accuracy of 0{N) algorithms, documented in 
chapters O and |4] was also an exercise in implementing the ideas of my paper 
on reliability. I followed most of the best practices I had suggested, in order to see 
whether they were really such good ideas before submitting the paper for publication. 
One very exceptional best practice is that I am making both this Ph.D. thesis and 
my research papers entirely reproducible; meaning that I supply an automated way 
for interested parties to re-compute all my results, recreate the graphs shown in 
my papers, and recompile the papers themselves. All necessary code and files 
are freely available on the Internet under the GNU public license [7] or other free 
licenses. Reproducible papers are an important tool for managing the complexity and 
unreliability of computation, but they are still only beginning to be used. There are 
very few such completely reproducible papers available in the scientific community 
as a whole, perhaps less than ten. The lessons I have learned from writing these 
reproducible papers are included in chapter|51 

One important way of checking numerical results is to develop as many analytical 
predictions as possible, in various limits or special cases, and then to check that 
the software reproduces the correct result. Such analytical predictions can be very 
difficult to obtain if the problem being calculated is complex. In particular, there 
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are currently no analytical predictions that can be used to test the algorithms used 
to simulate fermions in lattice QCD, except in test cases where gauge fields have 
very special smooth configurations. These cases therefore fall outside the region of 
physical interest, where the gauge field has its own independent degrees of freedom 
and is therefore disordered. One of my research goals is to develop analytical 
predictions about fermionic algorithms in the presence of disorder, which would 
provide for the first time the ability to check the convergence and accuracy of 
fermionic algorithms in physically realistic problems. This would allow me to not 
only check the existing fermionic algorithms but also compare and contrast them 
with new 0(N) algorithms. 

The overlapping communities of mesoscopic physics and random matrices have 
developed a variety of field theoretical techniques for making analytical predictions 
about disordered systems. In the limit of weak disorder, they are able to make 
analytical predictions about eigenvalues, eigenfunction self-correlation, correlation 
between eigenfunctions, products of Green's functions, etc. Two field theories are 
available for these calculations: the first is a sigma model, based on a "replica" 
technique, which involves modelling many species of particles, with all the species 
having the exact same physical properties. This replica technique is very flexible and 
has been applied to a wide range of very hard problems. Models derived using the 
replica technique correctly describe physical systems where there really are many 
species of particles. However, the replica sigma model is most often used to describe 
physical systems with only one species of particle. In this case, the replica model is 
correct to all orders of perturbation theory, but can be wrong when treating non- 
perturbative phenomena [8]. The second field theory is also a sigma model, but is 
called a supersymmetric sigma model because it is based on using graded matrices, 
i.e. matrices containing both Grassman variables and non-Grassman variables. This 
field theory in terms of graded matrices is derived from the original field theory via a 
Hubbard-Stratonovich tranformation. The supersymmetric sigma model is confined 
to a much smaller set of problems than the replica sigma model, but within its 
range of applicability it distinguishes itself by correctly handling non-perturbative 
phenomena. 

To date, no work has been done generalizing these models to the problems I'm 
interested in, namely fermions moving in a disordered potential, with the fermions 
modeled by a determinant as is normal in quantum field theory. There is no real 
conceptual difficulty with this generalization, it's just that no one has realized that 
it would be useful, partly because of the scientific community's limited interest 
in testing its computations. However, a considerable amount of work would be 
required to generalize the supersymmetric sigma model: one's graded matrices 
would have more fermions than before, and there is a complicated and tricky step 
in the supersymmetric method where one factors the supersymmetric matrix into a 
product of other matrices. So despite the lack of conceptual difficulty, the actual 
dirty work of this generalization is a bit daunting. Generalizing the replica sigma 
model is much easier, but one can't be sure of the non-perturbative results. 

So I was preparing to do this generalization, but along the way I realized that 
there is a third sigma model, which can be used to obtain all the results which 
are accessible to the supersymmetric sigma model, but does not involve graded 
matrices. This realization was stimulated by a recent paper by Fyodorov [9] which 
derived the third model in the case of a system with no spatial extent; i.e. one which 
is a single point in space. At the end of his paper he wrote " Finally, it is interesting 
to explore if the Ingham-Siegal integrals and their natural generalizations could 
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provide a serious alternative to the Hubbard-Stratonovich transformation in the 
whole domain of random matrices and disordered systems." However, Fyodorov did 
not begin to follow up this speculation until September 5, 2004, when he submitted 
a preprint which, in an appendix, briefly describes how how to use his technique to 
derive a replica sigma model on a lattice [10]. The sigma model which he derived 
was first introduced twenty-four years ago, and was the first sigma model derived 
for a disordered system [11-15]. Fyodorov's derivation was exact because he derived 
the sigma model from a hamiltonian with many identical species of particles. He 
did not treat the more physically important case of a system with only one species 
of particle, which would traditionally require either the replica technique or graded 
matrices. However, he did speculate that this was possible, writing: "Another 
interesting problem is how to include anticommuting degrees of freedom in the 
above derivation. Technically this can be done following various methods 
Fyodorov then cites both a paper by Zirnbauer which does not include Grassman 
variables and also unpublished research done by Fyodorov himself. 

I did not become aware of Fyodorov's preprint until early October, 2004. A 
month earlier, in early September, I began thinking that Fyodorov's zero-dimensional 
sigma model can be extended to systems which are spatially extended, and that all 
the results previously obtained with graded matrices can be obtained without them. 
This assertion, if true, would have a large influence on mesoscopic theory, as it 
offers a drastic simplification and also makes the theory much more flexible, so that 
generalizing, for instance, to the problem of fermion determinants in disordered po- 
tentials becomes quite easy. So I began very carefully working through the details 
of deriving the new model and reproducing results already derived using the super- 
symmetric sigma model. This work, contained in chapter is not yet complete, 
but already it contains a detailed derivation of the new sigma model of extended 
systems, as well as the most detailed and thorough tutorial in the literature about 
how to use a field theory to calculate observables in the unitary ensemble of random 
matrices. 

Earlier I examined the accuracy of a certain class of 0{N) algorithms, namely 
basis truncation algorithms, which can perform remarkably well. However these 
algorithms are particularly simple minded, and one might expect to find situations 
where they fail but more sophisticated 0{N) algorithms succeed. In chapterCll 
review all the 0{N) algorithms which I am aware of. I then suggest several new 
algorithms which I believe are my own contribution. 

My interest in 0{N) algorithms and in disordered systems was stimulated by 
an attempt many years ago to figure out how to apply renormalization ideas to the 
0{N^) problem of diagonalizing a matrix. The theory of renormalizing linear algebra 
0{N^) problems is very simple compared to that of renormalizing field theories, but 
it is still very hard to figure out how to use this theory non-perturbatively to obtain 
faster algorithms, because the renormalization itself requires 0{N^) time. When 
renormalizing field theories, there is a similar difficulty (i.e. an infinite number of 
operators are created), which is surmounted by assuming that the renormalized 
theory is local, and also assuming that the system is self-similar between various 
length scales, and also using perturbation theory. Here we are doing numerical 
calculations, which are usually aimed at obtaining non-perturbative information, 
so perturbation theory is not an option. Moreover one wants to avoid making 
assumptions about self-similarity and to compute the relation between length scales 
from first principles. 
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However the assumption that renormalization preserves locality can still be use- 
ful, and translates into a sort of near-sightedness principle. Renormalization is 
basically a program of averaging neighboring short distance degrees of freedom to 
produce a smaller set of long distance degrees of freedom. Locality preservation 
means that one assumes that the short distance degrees of freedom influence only 
the long distance degrees of freedom which are close by. In other words, short dis- 
tance degrees of freedom that are far apart interact only via long distance degrees 
of freedom. This is a key assumption; otherwise the perturbative renormalization 
program in field theory would fail. 

0{N) algorithms are simply the result of making the locality assumption which is 
so basic to renormalization. 0{N) nearsightedness states that none of the degrees 
of freedom in the problem interact at long distances; that the problem can be 
effectively broken into parts. The is just the simplest case of renormalization 's 
locality preservation assumption, which I discussed in the last paragraph. 

It is natural to consider physical problems where there are more length scales, 
and to seek a way of renormalizing away first the smallest length scale, then the 
next smallest, etc. Therefore even before my thesis I ana lyzed a class of 0{N\nN) 
algorithms which can be viewed as generalizations of 0{N) algorithms, and are 
specifically designed to handle multi-scale systems. In chapter|Sll explain my new 
O(iVln^iV) algorithms, and the very substantial accelerations that they can produce. 
These new algorithms use some of the new 0{N) algorithms explained in chapter 
0as building blocks. Both chapters are of interest to persons working on numerical 
computing in problems with multiple length scales. 



Chapter 1. Introduction 



Chapter 2 



Layperson's Introduction to 
Chapters SI and H 

2.1 High School Algebra 

Do you remember doing algebra in high school? There were variables x and y, a 
and h, letters which represented numerical quantities but whose values were not 
known. You were given equations relating those variables to numbers, for instance 
5 = 3a; + 2. Then your teacher asked you to solve the equation, which meant find 
out what value the variable had; in this case you would find x = 1. 

Perhaps you were also given equations with two variables in them; x + y = 1. 
It turned out that if you had two variables, you needed two equations in order to 
find a solution. For instance, if you had a second equation y = 1, you could solve 
for x; X — 0. 

And you may even have been told, but probably not, that you can actually have 
as many variables as you like, a whole alphabet, or thousand or millions of them. 
But you also need as many equations as variables before you can find your solution. 
Thus you can imagine having a thousand equations for a thousand variables, and 
solving to find the value of each variable. Of course this would be a nightmare; doing 
two unknowns was already a chore in high school; a thousand would be impossible 
for mere mortals. But computers - champions at busywork - can do this sort of 
thing. 

In algebra you probably also were introduced to inequalities, graphing, and the 
quadratic equation. If you remember the quadratic equation then you know it was 
exceptionally complicated to solve. This is because all the other equations (and sets 
of equations) you saw were linear, which means that variables were always multi- 
plied by numbers, never by other variables. Nonlinear equations are considerably 
harder than linear ones: for instance nonlinear equations are connected to chaos and 
unpredictable behavior and sometimes do not have any solution at all, while linear 
equations are much better behaved and easier to solve. Yet despite their relative 
simplicity linear equations are interesting and complex enough that there is a whole 
mathematical specialty devoted to them. My concern in this introduction focuses 
on problems formulated in terms of linear equations, which are called linear alge- 
bra problems. One linear algebra problem that is very interesting is the eigenvalue 
problem, which is a way of solving many sets of linear equations at the same time. 
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2.2 Scientific Calculations 

The reason why I took you on this walk through memory lane is that most sciences 
and applied sciences make predictions by creating large sets of variables and equa- 
tions, and then solving them. For instance, suppose an engineer wanted to predict 
how a structure (a house, a bridge) will respond to forces on it. First she would break 
it down conceptually into pieces: boards, struts, walls, cables, whatever. Second, 
she would represent each piece with a variable. Next, for each piece she would write 
an equation which tells - in a mathematical way - how it acts on, and is acted on 
by, all the other pieces in the structure; i.e. the forces on it. Lastly, having written 
down these equations, our engineer would solve them, using a computer of course. 
This would allow her to predict how a house would respond to an earthquake, or 
whether a bridge would bear a 100 ton load. 

The same sort of thing happens in chemistry. Everything is made up of atoms 
like hydrogen and copper, and each atom has one or more electrons around it. The 
quantum rules of how atoms connect up to make molecules, liquids, and solids are 
determined by what their electrons do; sometimes all the electrons will stay close to 
the atoms that own them; other times the atoms will share electrons. The details 
of this process determine what molecules and materials are formed, their colors 
and consistencies, how chemical reactions happen; basically everything. In order 
to make a quantum calculation of these things, you need at least one variable for 
each atom, and more if you want to be more accurate. You also need a set of 
equations saying how each electron is influenced by the neighboring electrons. With 
the variables and equations in place, you need to solve the equations to find out 
what really happens. This sort of procedure has worked very well for calculating the 
properties of smallish molecules with up to a few hundred atoms, and also a lot of 
crystals because they can be broken down into their smallest building block, which 
is often just a few atoms. 

2.3 Speed Problems 

There is a problem, though: modern computers often can't solve the equations in 
a reasonable amount of time. It turns out that solving the eigenvalue problem (and 
also a lot of other linear algebra problems) with a few thousand variables stretches 
their limits, especially when (as often happens) you have to repeatedly change the 
numbers a bit and then solve again. The practical consequences are pretty bad. 
For instance, to calculate what happens in biology you have to deal with proteins, 
which contain thousands or tens of thousands of atoms. And these proteins are 
immersed in liquids, which contain many more atoms. So right now there is no way 
to do a quantum calculation for most biological problems, and therefore accuracy 
is very limited. 

The same sort of issue comes up with many other important scientific and 
engineering problems. For instance, if you wanted to predict the behavior of a 
skyscraper or supertanker you might want to think about all their beams and struts 
and plates and joints, but you can't because there are too many. In this case what 
you would have to do is start grouping things together into units (for instance a 
set of struts making a truss), and assume that each unit moves together. This 
simplification will allow you to calculate stuff on a computer, but it will also be less 
accurate, as you can see if you consider that your computer is no longer allowing 
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for the possibility that units may twist, bend, or even break apart. 

You may have heard that computers speeds are growing fast. In fact their 
speed is growing exponentially, just like the world population. Of course such 
explosions can't last forever; eventually a limit is reached, but you could still imagine 
(optimistically) that we might not hit the limit for another twenty years, in which 
case computer speeds might multiply by a thousand. Then it would be interesting 
to know what the speed increase would do for our ability to solve equations. 

There is a whole mathematical discipline devoted to analyzing different problems 
and figuring out how much effort is required to solve them. Some problems are so 
inherently complex that multiplying your speed (or, alternatively, multiplying the 
time you spend) doesn't really do much. A simple example of this would be trying 
to understand how to move in a board game like chess, if you know the rules but 
don't know how to play the game. (Actually, almost any problem in real life is like 
this if you approach it in a literal, systematic, pencil-pusher sort of way.) Other 
problems are much easier, even for pencil pushers: for example when sorting books 
into alphabetical order, if you have twice as much time, then you can sort almost 
twice as many books. Solving the eigenvalue problem is an intermediate case, but 
still pretty bad: if you have ten times as many variables, you will need to put in a 
thousand times as much effort. Which means that we can't expect faster computer 
speeds to help us much with a lot of important scientific and engineering problems, 
for example accurate quantum calculations of most biological processes. 

You may know from experience that often a hard problem can made easier by 
figuring out what is important and what is not, and then concentrating on the 
important part. Moreover, if you have a big problem, you might try to break it 
up into smaller problems which can be solved easily one by one. "Divided we fall, 
united we stand." Unfortunately many linear algebra problems are not friendly to 
the ideas of solving one piece at a time, or of figuring out important things first. In 
linear algebra problems, every variable is equally important, and depends on every 
other variable. For example, they predict that the movement of one beam in a 
building depends on what's happening to every other part of the building. This is 
why solving the eigenvalue problem is so hard; everything is bundled together into 
the same knot, and you have to find the solution all at once instead of picking the 
knot apart bit by bit. 

2.4 The Distance Strategy and 0{N) Algorithms 

When confronted with a roadblock, find another way. In the past ten years many 
scientists have begun to think that maybe linear algebra could be improved upon. 
We know from real life that not every fact is equally important, and in fact we 
know that a lot of stuff can be ignored. Maybe there is some way to modify the 
equations so that they sort out important from unimportant. Of course, you need a 
strategy for deciding what's important for calculating the value of a variable. One 
simple strategy is distance: if something is close by, it's important; if not, it's not. 
Scientists have used this strategy to create new equations where variables depend 
only on things that are close by them. For instance, when using these equations on a 
molecule, electrons only depend on what's happening close by, and therefore you can 
break a molecule up into pieces which can be calculated separately, and everything is 
much quicker. The new equations - called 0{N) algorithms (pronounced "order N 
algorithms" ) are much easier to solve, and nowadays molecules with many thousands 
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of atoms can be calculated. This is very exciting: by choosing a very simple strategy 
(which I will call the distance strategy) of making a black and white distinction 
between near and far, lots of previously impossible problems are suddenly in reach. 

But does the distance strategy make sense? It depends. Imagine a house with 
a tiled roof, and that you have the job of predicting what will happen to one of 
the roof tiles. Well, certainly you need to know about things close to the tile: its 
support, the tiles around it, workmen walking on the roof, and any tree branches 
falling close by. And - at least if the house is well designed - you can ignore things 
a little farther away: whether the windows are open or closed, the number of rooms 
in the house, and even damage to house: an errant driver, or a tree falling against 
another part of the roof. So, clearly the distance strategy makes some sense. And 
it might make even more sense when applied to other physical problems, like many 
chemistry calculations. 

But there are also circumstances where the distance strategy doesn't work. The 
foundation is the part of the house farthest from our roof tile, and yet if it washes 
away then the roof will not stand on its own. Hundreds of miles away, deep in 
the ground, masses of earth slipping past each other can destroy the house in an 
earthquake. And the last week's weather thousands of miles away can determine 
the rain, wind, hail, or sun beating on your tile. 

The next two chapters describe some work that I did on checking when these 
distance-based strategies work, and trying to figure out what decides whether they 
work or fail. I solved some linear algebra problems the correct-but-slow way, and 
also using an 0{N) algorithm, and compared the results. I was surprised that the 
0{N) algorithm worked far better than I had expected. 
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In this chapter I discuss some fast algorithms for evaluating the density matrix, 
which is a matrix function; i.e. a function whose argument and value are both 
matrices. This is a difficult problem from linear algebra, a field which contains 
immense challenges that have stimulated a very active and diffuse research effort 
ever since computers were introduced. When solving a linear algebra problem one 
must first choose a finite basis. Second, one must ensure stability of the algorithm, 
or in other words ensure that it always converges to a result. Third, one must analyze 
and understand the accuracy of the algorithm; its result must be in some sense close 
to the correct answer. Fourth and lastly, one must accelerate the computation as 
much as possible, whether by choosing an algorithm which is fast or well-suited to 
the problem at hand, or by choosing a basis which matches the physics well and 
thus can be trucated to a small number of basis elements. Traditionally, physicists 
have worked on the choice of basis - and, to some extent, the choice of algorithm 
- while the invention of algorithms, as well as their convergence and accuracy, have 
often been left to computer scientists. 

One of the biggest challenges is a computational bottleneck: evaluation of a 
matrix function generally require 0{N^) time, where N is the basis size and usually 
scales linearly or worse with the system volume. This prohibits computations of 
large systems, even with modern computers. In particular, if a matrix function must 
be recalculated at every step of a system's evolution in time, then calculations with 
a basis size larger than 0(1000 — 10000) are not practical. For example, quantum 
simulations of a single protein molecule require re-evaluation of the density matrix 
function at every time step. Calculations of protein dynamics, which would involve 
at least 0(2000) atoms, are well out of reach [16]; in fact modern computers 
are limited to O(IOO) atoms. Larger numbers of atoms can be handled only by 
smearing them all together into a jelly (jellium), or else by stepping away from 
quantum physics. Moore's law will not solve this problem: a 10'^ increase of speed 
allows only a 10^ increase in system volume. No exponential growth lasts forever. 
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3.1 0{N) Algorithms 

In the last ten years physicists have taken the lead in developing a new class of 
algorithms motivated by a physical insight: a substance's properties at a point 
A should not depend on things far away from A. For instance, a point lattice 
defect should have little effect on electron orbitals even a few lattice spacings away. 
Although this is a familiar idea, perhaps Kohn [17] is responsible for naming it, 
calling it the " nearsightedness" principle. The nearsightedness principle is translated 
into an expectation that the density matrix function should have a finite range in 
position space, beyond which its matrix elements die off exponentially. Therefore 
the density matrix contains only 0{N) non-zero elements, and one can invent 
algorithms for evaluating the density matrix which run in 0{N) time. In 1991 W. 
Yang introduced the first such algorithm, the Divide and Conquer algorithm, which 
first diagonalizes the argument of the density matrix and then approximates the 
density matrix function, all in 0{N) time [1]. This stimulated the development 
of many other 0{N) algorithms [2, 18-21] which have met considerable success, 
permitting quantum calculations of systems with tens of thousands of atoms. 

0{N) algorithms succeed because they are an application of a very basic problem 
solving principle. Both humans and the computers we create are fundamentally 
limited, and therefore our problem solving must always start by separating the 
important information from the unimportant, the wheat from the chaff. We break 
this basic problem-solving rule when we try to evaluate a matrix function while using 
a large basis. The real problem is probably not the basis size but the conceptual 
framework of linear algebra, which contains no notion of the relative importance 
of information, so that a solution's value can be changed by varying any matrix 
element whatsoever. 0{N) algorithms correct this mistake by selectively ignoring 
certain information. 

All 0{N) algorithms rely on special characteristics of the system under study, 
and the question of their validity can be answered only after having specified that 
system. To date, theoretical studies of the validity of these algorithms have occurred 
almost exclusively within the conceptual framework of ordered systems, using ideas 
of metals, insulators, and band gaps [2,22-30]. In this chapter and the next I 
carefully examine the applicability of 0{N) algorithms to disordered systems. I 
calculate the density matrix function in two ways: with an 0{N) algorithm, and 
via diagonalization. Comparison of the two results provides some new insight into 
when disorder can make 0{N) calculations feasible. 

All 0{N) algorithms make three basic assumptions: 

• Existence of a Preferred, Local Basis. It is assumed that the system is best 
described in terms of a localized basis set. I here define a basis as localized if 
for any basis element {t/j), only a small number of positions x satisfy {x\ip) ^ 0. 
In particular, plane wave bases are excluded. Note that crystal calculations 
are still possible, by using a free propagator that properly includes the crystal 
lattice structure. This theory, called KKR band theory, was developed by 
Korringa, Kohn, and Rostoker [31,32]. 

• Existence of a Distance Metric. There must be a way of computing the 
physical distance between any two basis elements \tl)) and 

• Locality of both the Matrix Function and its Argument. I call a matrix A local 
if, for every pair of basis elements \x) and \y) that are far apart, (xjAI^ = 0. 
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Throughout this chapter and the next I choose a simple criterion for being far 
apart: comparison with a radius R. 

There are a number of ways for an 0{N) algorithm to exploit the three basic 
assumptions. In this chapter and the next I focus on the class of algorithms based on 
basis truncation. This class includes Yang's Divide and Conquer algorithm [1], the 
"Locally Self-Consistent Multiple Scattering" algorithm [33,34], and Goedecker's 
"Chebychev Fermi Operator Expansion" [35,36], which I will henceforth call the 
Goedecker algorithm. Basis truncation algorithms break the matrix function into 
spatially separated pieces. Given the position of a particular piece, the basis is trun- 
cated to include only elements close to that position, and then the matrix function is 
calculated within the truncated basis. Thus, for any generic matrix function f{H), 
a basis truncation algorithm calculates {x\f{H)\y) = {x\f{PgffHPg^ff)\y), where 
Ps,ff is a projection operator truncating all basis elements far from x, y. There may 
be also an additional step of interpolating results obtained with different P's, but I 
will ignore this. In this work I choose P to be independent of the left index x, and 
truncate all basis elements outside a sphere of radius R centered at y. 

Basis truncation algorithms vary only in their choice of how to evaluate the 
function /(Pg^gHPg -ff). Specifically, Yang's algorithm calculates / in terms of the 
argument's eigenvectors, the Goedecker algorithm does a Chebyshev expansion of 
/, and the "Locally Self-Consistent Multiple Scattering" algorithm calculates the 
argument's resolvent or Green's function and then obtains / via contour integration. 
Because these approaches are all mathematically equivalent when applied to analytic 
functions, they should all converge to identical results, as long as one makes identical 
choices of which matrix function to evaluate, of how to break up the function, of 
which projection operator to use, and of a possible interpolation scheme. Moreover, 
given an identical choice of matrix function, variations in the other choices should 
obtain results that are qualitatively the same. In this work I use the Goedecker 
algorithm, but I want to emphasize that the results obtained here apply to the 
whole class of basis truncation algorithms. 

The Goedecker algorithm is essentially a Chebyshev expansion of the matrix 
function. As long as all the eigenvalues of the argument H are between 1 and 
— 1, a matrix function may be expanded in a series of Chebyshev polynomials of 
H: f{H) = X]f=o ''s^s(-ff)- The coefficients Cg are independent of the basis size, 
and therefore can be calculated numerically in the scalar case. The Chebyshev 
polynomials can be calculated in 0{N) time using the recursion relation T^+i = 
i2HTs) - Ts-i, Ti = i7, To = 1. (Of course, one must also bound the highest 
and lowest eigenvalues of H and then normalize. In practice very simple heuristics 
are sufficient for estimating these bounds.) If the matrix function f{H) has a 
characteristic scale of variation a, then the error induced by the Chebyshev expansion 
is controlled by an exponential with argument of order —aS. 

3.2 Measuring The Error 

Previous efforts to test numerically the accuracy of 0{N) algorithms have almost 
always been confined to qualitative evaluations of whether their overall physical 
predictions (total energy, bond structure, etc.) are reasonable, and tests of con- 
vergence. The only exceptions that I am aware of are two papers examining the 
accuracy of the Green's function, which is computed as an intermediate step in 
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the "Locally Self-Consistent Multiple Scattering" algorithm [3,4]. The research 
presented here sets itself apart by carefully checking basis truncation algorithms 
quantitatively in a systematic and detailed way. I computed the density matrix 
function using both an 0{N) algorithm and an algorithm based on diagonallzation, 
and then compared the results. 

This careful comparison required development of a metric for comparing two 
matrices. First, note that the dot product used for comparing two vectors can be 
easily generalized to matrices: 

MDP{A,B) = Tr{AB) = ^ {x\A\y){y\B\x) 



\f A = B, this matrix dot product is just the square of the Frobenius norm, one of the 
traditional norms for matrices. Note also that matrix dot product is invariant under 
change of basis. Moreover, it is simple to show that — 1 < mdp(a,b) ^ 

" — ^yMDP{A,A)MDP{B,B) — 

1, so one may define the angle 6 between two matrices as the arcsin of this quantity. 
Bowler and Gillan [37] justified this, showing that the concept of perpendicular and 
parallel matrices is valid and useful. 

However the matrix dot product is not quite suited to my needs. 0{N) al- 
gorithms have a preferred, local basis, and thus are not well matched by a basis 
invariant measure. Moreover, the matrix functions which they compute are ex- 
pected to agree best with the exact matrix functions close to the diagonal, and to 
agree not at all outside the truncation radius. Therefore, a more sensitive metric 
is needed, one that distinguishes different distances from the diagonal. I define the 
Partial Matrix Dot Product as: 

MDP{A, B,x) = Y^ {y\A\y + x){y\B\y + x) 

V 

The argument x of this dot product allows me to obtain information about 
agreement at displacement x from the diagonal. It is still valid to call this a dot 
product, because the magnitude of mdp(a,b,x) |^ bounded by one, 

y/MDP{A,A,x)MDP{B,B,x) ' 

and thus one can compute a displacement-dependent angle Q{x) and relative mag- 
nitude m{x). The partial matrix dot product has a simple sum rule relating it to 
the full matrix dot product: MDP{A,B) = J2^MDP{A,B,x). 

In my results I actually compute another dot product, an angular average 
MDP{A, B,r) over all x satisfying r = \x\. 

In this work I used a projection operator P for basis truncation which is inde- 
pendent of the left index x and truncates all basis elements outside a sphere of 
radius R centered at y. This truncation strategy was chosen because it allows the 
matrix function to be calculated one column at a time instead of one element at 
a time. However, this strategy does not respect transverse symmetry in the matrix 
function's argument, which should result in the matrix function also having trans- 
verse symmetry. The asymmetric truncation also destroys the matrix dot product's 
reflection symmetry MDP{A, B, x) = MDP{A, B, -x), which holds whenever A 
and B have transverse symmetry. However, it is reassuring to note that the angular 
matrix dot product MDP{A, B,r) is not sensitive this difficulty. In any case, the 
symmetry breaking effects go to zero as the truncation radius R is increased to infin- 
ity, and therefore are insignificant unless the 0{N) algorithm would fail irregardless 
of choice of truncation strategy. 
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As far as I know, the first occurence of these matrix dot products and partial 
matrix dot products in the literature was in my paper on density matrix errors [38]. I 
believe that the partial matrix dot product defined here is the appropriate metric for 
evaluating the accuracy of 0{N) algorithms, and will be useful to other researchers. 

3.3 The Density Matrix 

In this paper I restrict my attention to a single matrix function, the density matrix. 
This function is very important in quantum calculations of electronic structure in 
atoms and molecules, where its argument is the system's Hamiltonian, its diagonal 
elements describe the charge density, and its off-diagonal elements are used to 
compute forces on the atoms. Eigenvalues of the Hamiltonian give the energies of 
their corresponding electronic states, and I will use the words eigenvalue and energy 
interchangeably throughout the rest of this thesis. 

The density matrix function p{fi,T,H) is basically a projection operator which 
deletes eigenvectors having energy E larger than the Fermi level /U. Here I use the 
following form: 

p(^,,T,H) = \Erfc{V2{^^^)) 

For physical reasons, it is not quite a projection operator: it has a transition region 
around /j. of width proportional to the temperature T where its eigenvalues interpo- 
late between and 1. The error induced by a Chebyshev expansion is controlled by 
an exponential with argument proportional to —TS/A, where A is the size of the 
energy band and S is the number of terms in the Chebyshev expansion [35]. 

The density matrix is well suited to 0{N) algorithms. As /x becomes large, it 
converges to the identity. Moreover, it is invariant under unitary transformations 
acting on the set of eigenvectors with energies below the Fermi level [19]. Even 
when the Hamiltonian's eigenvectors are not a local basis set, often a unitary trans- 
formation can be found which maps them to a basis which is localized. If such a 
transformation exists, the density matrix is localized. 

Several papers have examined density matrix locality in the context of ordered 
systems; i.e. ones whose Hamiltonians possess lattice translational invariance [2,22- 
30]. (Lattice translational invariance can be expressed quantitatively as {x\H\y) = 
{x + A\H\y + A) for all A located on an infinitely extended lattice.) It is well known 
that the eigenvalues of such systems are arranged in bands separated by energy gaps 
where there are no eigenvalues, and that the eigenvectors are extended through 
all space. Notwithstanding the nonlocality of the eigenvectors, there are strong 
arguments for localization in all ordered systems. If the system is metallic (meaning 
that the Fermi level lies in one of the bands of eigenvalues) and the temperature 
is zero, then in a three-dimensional system the density matrix is expected to fall 
off asymptotically as R^^ , where R is the spatial distance from the diagonal. A 
non-zero temperature multiplies this by an exponential decay. If instead the system 
is an insulator(meaning that the Fermi level does not lie in one of the bands of 
eigenvalues), then the density matrix should decay exponentially even at T = 0. 

Most systems of physical interest do not exhibit lattice translational symmetry. 
In particular, many exhibit inhomogeneities at scales much smaller than that of 
the system itself. These are termed disordered systems. In this thesis I study the 
prototypical disordered system, the Anderson model [39]. It describes a basis laid 
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out on a cubic lattice, one basis element per lattice site, and a very simple symmetric 
Hamiltonian matrix H composed of two parts: 

• A regular part: {x\H\y) 1 if x and y are nearest neighbors on the lattice. 
This term is, up to a constant, just the second order discretization of the 
Laplacian; its spectrum consists of a single band of energies between —2D 
and 2D, where D is the spatial dimensionality of the lattice. 

• A disordered part: Diagonal elements < x\H\x > have random values chosen 
according to some probability distribution. In this article I choose to use the 
Gaussian distribution 

1 

I call a the disorder strength; this is related to the disorder strength used in 
the literature by a factor of \/l2 [40,41]. 

At small disorder strengths, the Anderson model is dominated by its regular part; 
in particular the eigenvectors are extended throughout the whole system volume. 
However, there is a small but important departure from the ordered behavior: at 
the band edges one finds a few eigenvectors with volumes much smaller than the 
system volume. In fact, there is an energy Elqc such that any eigenvector with 
eigenvalue E satisfying \E\ > E^oc is localized. On average these eigenvectors 
decay exponentially with the spatial distance from their maximum [42]. As the 
disorder strength is increased, Elqc gets smaller and smaller; i.e. more and more 
of the energy band becomes localized. At a critical disorder strength the whole 
energy band becomes localized. This phenomenon is called the Anderson transition; 
for the Gaussian probability distribution used in this paper it occurs at the critical 
disorder ctc = 6.149 ± 0.006 [40]. 

Note that these statements must all be understood as regarding the ensemble 
of Hamiltonians determined by the probabilistic distribution of the disorder: for 
instance, I am stating that above the critical disorder the subset of Hamiltonians 
with unlocalized eigenvectors is vanishingly small compared to the total ensemble 
size. Moreover, these statements are valid for an infinite lattice; the mapping to 
computations on a finite lattice is not always absolutely clear. 

Studies of the locality of disordered systems have traditionally concentrated 
on computations of the Green's function, not the density matrix. It is expected 
that the average of the Green's function should decay exponentially as exp(^), 

where R is called the coherence length [43]. The density matrix, as we will see, 
is closely related to the Green's function, so one may hope that its average will 
also decay exponentially. However, there are two reasons why this hope may be 
unjustified: First, we do not need to know whether the average of all density matrices 
is localized, but instead whether each individual density matrix is localized. The 
difference between the two could be significant. Second, in a system below the 
critical disorder there will be unlocalized eigenvectors, and one might therefore 
expect the behavior typical of a metal. 

Many 0{N) computations have treated systems which are disordered [27,35, 
44]. However, the 0{N) literature contains little theoretical material about the 
applicability of 0{N) algorithms to disordered systems. The originators of the 
"Locally Self-Consistent Green's Function" algorithm, which does not truncate the 



3.4- Estimating the Error 



19 



basis but instead does a sort of averaging outside of a radius r, suggested that 
r should be related to the coherence length [33], and also to the error induced by 
their averaging [34]. Zhang and Drabold computed the density matrix of amorphous 
Silicon using exact diagonalization and found an exponential decay [27]. In the next 
sections, I will first argue theoretically and then show numerically that 0{N) basis 
truncation algorithms are applicable to disordered systems, including ones far below 
the Anderson transition. 



3.4 Estimating the Error 

The quantity of interest is the relative error, 



^ MDP{A{R),A{R),r) 

MDP{f,f,r) ^^-^^ 

where R is the radius of the truncation volume and A(i?) is the difference between 
the exact matrix function / and the approximate matrix function f{R)- 
A first guess can be made from the intuition that the absolute error 

E{r, R) = MDP{A{R), A{R), r) (3.2) 

is probably bounded by its value at the boundary of the truncation region. This 
allows a rough estimate of the relative error: e{r.R) ^ MDPi'ff r) ' suggesting that 
it can be made arbitrarily small if the matrix function is localized. The numerator, 
however, is left undefined. Perhaps it is reasonable to assume that on the boundary 
the absolute error is equal to the approximate matrix function, giving: 

eir,R)<^^^mm (3.3) 
^ ' ^~ MDP{f,f,r) 

The following paragraphs develop further insight into the absolute error by re- 
solving A{R) into a multiple sum over dot products between the argument's eigen- 
states I'll)) and the position eigenstates |a;) . Knowledge of the normalization and 
asymptotic behavior of these states sheds some light on the magnitude of the matrix 
elements of the error: {x\A{R)\x + r) . 

Basis truncation algorithms separate the basis into two projection operators. 
Pa for the part inside the localization cutoff R, and Pb for the part outside 
R. Pa and Pb are then used to divide the argument H into two parts: a part 
Ho = PaHPa + PbHPb which leaves A and B disconnected, and a boundary 
term connecting A and B, Hi — PaHPb + PbHPa- The final result of a basis 
truncation algorithm is just PAf{Ho)PA- Therefore, the error induced by a ba- 
sis truncation algorithm is entirely due to the boundary term Hi. In other words, 
PaA{R)Pa = PAUiHa + Hi) - .fiHo))PA- 

For matrix functions which are analytic on a region of the complex plane which 
contains the poles of H and Hq, an exact equation for this boundary effect can be 
easily derived from the Dyson equation. First define the Green's functions G{E) = 
{E — H)~ , Go{E) = {E — Hq)~ . Then note that the matrix function can be 
obtained from the Green's function through contour integration over the complex 
energy E: f{H) = ^ § G{E)f{E), where the complex integral must contain the 
poles of H. Next, apply the Dyson equation G = Go + GHiGq twice to obtain: 



G = Go + GoHiGo + GqHiGHiGo 
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This gives an exact relation between the correct Green's function G of the un- 
truncated argument H = Hq + Hi and the Green's function Gq of the truncated 
argument Hq. In order to obtain a similar relation for the matrix function /, one 
must make the poles in this expression explicit and then do a complex integration. 
Define \a) and \b) as two eigenvectors of Hq which are both located inside of the 
localization region, the set of |c) as the complete set of eigenvectors of H, and Ea, 
Eb, and Ec as their respective energies. Then: 



{x\/^{R)\x + r) = 
g{Ea,Eb,E,) = 

X 

n{E) is the spectral density J^c^i-^ ~ -^c) ^"'^ often approximated as a 
continuous function. Similarly, nA{E) is the spectral density of the eigenstates 
of Hq which are located inside the truncation region. If the matrix elements and 
matrix function are well-behaved, then this integral is also well-behaved. Consider 
the integral 7 = / // dEadEi,dEcg{Ea, Eb, Ec). When / is the density matrix 
and one uses a simple model with n equal to a constant ^ inside the energy band 

j], this integral is of order ^(^i?'^)^ when /i is inside the energy band and 
when it is outside the band. 

Assuming that all eigenvectors of both H and Hq are unlocalized, one can use 
Eq. 13.41 to derive an upper bound on |(x|A(i?,)|a; + f)| of order R*. In the case 
of the density matrix this is a gross overestimate. Nonetheless Eq. 13.41 can teach 
three lessons: 



3.4.1 Localized systems 

Suppose that all the eigenvectors are bounded by cxp (— l^'^^"! ), where a;o is the 
point where the eigenvector has its maximum magnitude and L is the minimum 
decay length of the system. Then one can use Eq. 13.41 to prove in the limit of 
large R that |(x|A(i?)|a; -I- f)| is bounded by a polynomial times exp (— ^'^^ ), that 
the absolute error E{r,R) is bounded by a polynomial times cxp {— '^^j^'^^ ), and 
that the relative error can be made arbitrarily small. This suggests that in localized 
systems the absolute error E{r, R) depends exponentially on r. If so, then Eq. 13.31 
is a gross overestimate. 



3.4.2 Unlocalized systems 

If the eigenvectors are unlocalized, then the magnitudes of {x\a) and {b\x + f) have 
no strong dependence on x and r. This suggests that |(x|A(i?)|x + is roughly 
independent of the position, and that E{r,R) is roughly independent of r, thus 
providing partial justification of Eq. 13.31 



^OO /"OO /'C30 

/ / dEadEbdEc 

-OO J —OQ J —OO 

{x\a){a\Hi\c){c\Hi\b){b\x + r)giEa,Eb,Ec) 
nA{Ea)nA{Eb)n{Ec) 

IdEfiE) i ^- i (3.4) 

/ 'E-EaE-EbE-Ec ^ ' 
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3.4.3 Finite coherence length 

Eq. I3.4l suggests that in systems with a finite coherence length 77 the absolute error 
is reduced, via reduction of the matrix elements {a\Hi\c) and {c\Hi\b). I assume a 
very crude model of the incoherence where the eigenvectors are broken into domains 
with constant phase, each domain of size ^f?"^- The main effect is to decrease any 
integral over an eigenvector by a factor of \/N\^, where A^,, is the number of different 
domains where the integrand is non-zero. Hi touches about such domains. 

Therefore if i? > 77, {a\Hi\c) cx ^ and E{r,R) cx j^. As I will show in section 
15.41 analytic calculations of the second moment of the matrix element confirm the 
same scaling law. 

3.4.4 The Coherent Potential Approximation 

An alternative way of estimating the error is available via the Coherent Potential 
Approximation [45,46]. This approximation estimates the average of the Green's 
function, so one could use it to calculate the average Green's function in a large 
volume and also inside a truncation volume with radius R and then estimate the error 
as being equal to the difference. Having obtained the error in the average Green's 
function, one could calculate the average error in any analytic matrix function by 
doing a complex integration. Of course this approach has the weakness that one 
is estimating the average error instead of the average magnitude of the error. But 
I nonetheless began implementing a program which would calculate the Coherent 
Potential Approximation (CPA) numerically, using the lATA algorithm [47] which 
has been proved to always converge to a unique solution [48]. The only interesting 
result so far of my work has been a demonstration that the published proofs of 
convergence are wrong in the cases of finite or periodic systems. In these systems 
the bare Green's function Gg (which ignores the disordered potential) has discrete 
poles. The lATA algorithm can (and, in my experience, always does) move the 
coherent potential variable toward values which correspond to those poles and make 
the bare Green's function blow up. So the coherent potential is converging (very 
very slowly) to an incorrect solution, but the bare Green's function is diverging. 
Also the solution is not unique - depending on the algorithm's initialization it may 
converge to any of Go's poles. It is likely that standard algorithms for finding zeros 
of nonlinear functions could circumvent this problem, but I haven't managed to give 
this a try yet. As I will show in section ITSl on numerical results, equation 13.31 gives 
a good estimate of the error, so obtaining an error estimate from the CPA is not so 
important. 

3.5 Results 

I studied ensembles of Anderson hamiltonians at eleven disorder values between 
(7 = 0.65 and a ~ 9.00, including the critical disorder dc = 6.15. In the results 
presented here a truncation radius of i? = 5 was used, but the results are qual- 
itatively similar to those obtained with R = 1,2,3,4. A lattice size of 16"^ was 
used, and calculations with 12'^ and 20"^ lattices at the critical disorder (Tc indicate 
that finite size effects are small. The largest such effect is an improvement of the 
basis truncation algorithm's accuracy at smaller lattice volumes. At each disorder 
I calculated the density matrix at 13 values of the Fermi level /i ranging uniformly 
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Disorder Strength 

Figure 3.1: The top graph shows the normahzed density matrix magnitude at 
^ — Q. Each Une corresponds to a different disorder strength between 0.65 and 
9.00; lower disorders are higher on the graph. The lower graph shows the ratio 
of the square root of the second moment to the mean at = 0. The different 
lines correspond to different radii r = n\/i, with smaller radii lower on the 
graph. 



from —12 to 12, which covered the whole energy band at the lower disorders, and 
most of it even at larger disorders. Close to the edges of the energy band the density 
matrix magnitude drops precipitously and the other observables also change rapidly; 
the main results reported here apply only to values of ^ where the spectral density 
remains high. 

A low temperature (T = 0.05) was chosen in order to minimize any temperature 
effect. A careful examination of the density matrix's behavior at /i = showed that 
any temperature effect was swamped by other effects. In particular, at low disorder 
the density matrix's behavior is dominated by lattice effects. Because of the low 
temperature a large number of Chebyshev terms was needed; for each matrix I used 
a number of terms S equal to 25 times the total band width, which was enough to 
make the truncation error quite small. 

The top graph of figure ITTl shows the normalized density matrix magnitude 
MJP(p'p'oj at = 0. For r > and a > 1.65 a good fit to this quantity 
can be obtained by 7r^'' exp (^^), where the coherence length is given by i? = 

(0.057(7 — 0.089 — 0.064(7^^) ^ and 7 is a normalization constant. Note the almost 
inverse relation between the coherence length R and the disorder strength a. Lattice 
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Figure 3.2: Important volume scales. Each volume measure was averaged across 
the ensemble and across a small interval of the energy spectrum. Shown here 
are the maximum values of these averages. 



effects cause a systematic uncertainty in the first constant 0.057 of roughly 30%. 
Similar fits can be obtained at other Fermi levels within the band < 6; the first 
constant has a minimum at /i = and a total variation of about 30%. 

Now I consider the statistical distribution of the density matrix magnitude. The 
lower graph of figure lTTl shows the ratio of the square root of the second moment 
to the mean. Note that this ratio seems to grow roughly exponentially with the 
disorder a. An examination of the kurtosis (the normalized fourth moment of the 
statistical distribution) of the density matrix magnitude shows that for ct > 5.15 this 
quantity becomes very large, starting at larger radii and larger Fermi level These 
statistics suggest that at the Anderson transition the statistical distribution of the 
density matrix magnitude develops a long tail; it loses its self-averaging property. 

Figure 1^21 shows two measures of eigenvector volume. The first is the inverse 

of the first participation ratio; i.e. (X^s I (V'l^?) |'^) /X^j I (V-'l^^) 1^- This quantity is 
a lattice friendly measure of volume because it has a minimum value of one when 
(■^la;) is a delta function and a maximum value of the system size when {ip\x) is a 
constant. Figure IT^ shows that this volume becomes smaller than the truncation 
volume in the range cr = 3.00 to cr = 4.00. 

-1 

My second volume measure is based on the second moment: ((27r) detQ) ^ 
, where Q is the second moment (or quadrupole tensor) of . Unlike the 

first volume measure, this measure remains large even at large disorders, indicating 
that each eigenfunction consists of several isolated peaks scattered throughout the 
system volume. This structure is caused by the fact that states with similar energies 
will mix even if they are connected by exponentially small matrix elements. However, 
mixing caused by such small matrix elements does not influence the density matrix, 
because it essentially just induces a unitary transformation of the mixed eigenvec- 
tors, and as we know the density matrix is invariant under unitary transformations 
between states that are all either less than or more than /i. 

Figure 13.21 also shows the maximum value of the coherence volume; Vc = 
■maxEV{E), where V{E) is the coherence volume. I calculated this volume by 

-*D ~* 1 -> 

first computing the correlation function C(i;) = ^ dk exp(«fc • x), and then 
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Figure 3.3: The upper graph shows the relative error e(r, R) (eq. I3.1|l as a 
function of disorder, while the lower graph shows the absolute error i?(r, R) 
fea l3.2(l as a function of disorder, /i — 0. 



applying my two volume measures to C^(x). Taking the maximum value resolves an 
important ambiguity; at disorder strengths a < 6.15 the coherence length shows two 
peaks at energies = 6 — 8. These peaks become very pronounced at cr < 1.65, 
where they are a factor of 10 above the minimum. Note that the coherence volume 
becomes small much sooner than the eigenvector volume, in the range from cr = 1.65 
to cr = 2.65. For cr > 1.15 and Vc ^ 1 it is roughly proportional to a~^. This fits 
well with the density matrix's coherence length at large cr, but not at small a. 

Figure IT^ shows the relative error and absolute error as a function of disorder. 
The 0{N) algorithm begins to work well at r = 0, 1 at quite small disorders, and 
the r = 2 relative error falls to 1% at about cr « 3. Clearly the 0{N) algorithm's 
success is controlled by the coherence volume, not by the eigenvector volumes. 

The overlapping lines in the lower graph of figure f3.3l (the absolute error) confirm 
section 13.4.21 s argument that the absolute error E{r,R) is roughly independent of 
r, except at r = 0. The magnification at r = is probably caused by the density 
matrix's close relationship to the identity matrix. It is more than compensated for 
by a corresponding magnification of the density matrix at r = 0, so that the upper 
graph (the relative error) shows that the value of the relative error e{r, i?) at r = 
is smaller than its value at r = 1. 

Section rj.4.31 suggested that a small coherence length may cause a decrease 

4 

in the absolute error E{r,R) of order The R — 5 line in the relative error 
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graph of figure IT3I shows that the decrease is actually even more pronounced: at 
the boundary r ^ R, E{r, R) is of the same order of magnitude as AI DP{f, /, r), 
which is controlled by an exponential. This is just the ansatz used to obtain Eq. 
13.31 my results fully support the validity of Eq. 13.31 except - as discussed before - 
at r = 0, and at the edges of the energy band. Therefore accurate estimates of 
the error of an 0{N) calculation may be obtained from the results of the 0{N) 
calculation itself. 

Section 13.4.11 showed that for large R the absolute error E{r,R) is bounded 
by a polynomial times cxp (— where L is the decay length. This suggests 

an r dependence which is not confirmed by the absolute error graph of figure IT51 
where it would cause a splitting of the lines at cr > 6.15. However, the previous 
paragraph showed that E{r,R) actually depends on R as R^^exp{^^), where 

R is the coherence length. Therefore as long as R < 2L, the bound obtained in 
section 15.4.11 is automatically verified at all r. Moreover at small R the unknown 
polynomial in the bound may mask any r dependence. 



3.6 Qualitative Reliability of These Results 

I have already discussed numerical due to the finite lattice size and to the truncation 
of the Cheybyshev expansion. I will discuss these errors in more depth in section 
13.71 but my checks indicated that they have at most a quantitative, not qualitative, 
effect on the results presented here. The main risks to the qualitative correctness of 
these results probably lie in two areas: finite ensemble size, and software reliability 
and reproducibility. I have taken steps to manage both issues: 

3.6.1 Finite ensemble size. 

All results presented here were obtained from ensembles of 33 realizations, but 
repeating some of the calculations with ensembles of 100 realizations yielded the 
same results. At the critical disorder the same quantities were computed with three 
different lattice sizes (100 realizations at both 12'^ and 16'^, and 10 realizations 
at 20'^), and the agreement is very good. Graphing any quantity across several 
disorders, one immediately notices that there is little noise induced overlap of the 
two graphs. Therefore, it seems likely that risks due to finite ensemble size are 
under control. 



3.6.2 Software Reliability and Reproducibility. 

I have tried very hard to reduce this risk, and discuss these efforts in some de- 
tail in chapter I2I The software includes an automated test suite which tests all 
computational functions except the highest level output (graph printing) routines. 
Moreover, I have taken pains to enable other researchers to easily reproduce and 
check my results, simply by installing my software and the libraries it depends on, 
compiling it with the GNU gcc compiler [49], and starting it running. The software, 
with all needed configuration files, is available under the GNU Public License [7]; 
check www.sacksteder.com for further details. 



26 



Chapter 3. Accuracy of 0{N) Algorithms for the Density Matrix 



3.7 Quantitative Error Analysis 

A full quantitative analysis of 0{N) error would include an error budget: how much 
error came from the finite size effect, from the Chebyshev expansion, from the 
0{N^) algorithm, and from statistical uncertainty caused by the finite sample size. 
In the next sections I argue that the errors from the Chebyshev expansion and the 
0{N^) algorithm are small, and I discuss the statistical uncertainty and the finite 
size effect. 



In this chapter I calculated the density matrix for a 16^ system with both an 0{N) 
algorithm and an 0{N'^) algorithm, and I presented a detailed calculation of the 
differences in results. However, the ideal would be to go a step further and figure 
out how the 0{N) error depends on the system size. I already mentioned that 
I computed several observables at the critical disorder with three different system 
sizes, and obtained good agreement. However there was a systematic effect that 
the error improved as the system size decreased, which was caused by the fact 
that basis truncation methods become exact once the truncation volume is larger 
than the system volume. Quantification of this effect, which I will call the finite 
size effect, would answer questions like "How much worse would the 0{N) errors 
be if the system were of size 1000'^ instead of 16^?" I present here a theoretical 
foundation for numerical estimates of the finite size effects. While I have not yet 
been able to carry out the numerical work prescribed by this theory, it is already 
clear that if the 0{N) error is controlled by an exponential then the finite size effect 
is also controlled by an exponential. 

The key intellectual step in this theory is to imagine that the disordered system 
is really infinitely big. This puts my exact calculation of a matrix function / in a 16"^ 
system on the same level as the 0{N) calculation: both results were obtained by 
truncating the basis of the (imaginary) infinite system. The quantity of interest is 
therefore the derivative of the matrix function with respect to the change in radius of 
the truncation volume. I will use the symbol a to represent the disorder realization 
of the infinite system, and use the function S{R,a) = —j^ f{R,cr) to represent the 
derivative of the matrix function / with respect to the truncation radius R. S{R, a) 
can be computed by simply calculating the matrix function at two nearby localization 
radii. In equationE3l defined the absolute error E{r, R) = MDP{A{R), A{R), r). 
A{R,a) is the difference between the exact matrix function and the approximate 
matrix function, and can be rewritten as A{R,a) = J°^dRiS{Ri,a). Substituting 
this into equation 1^21 I obtain the true error: 



3.7.1 The Finite Size Effect 




X 



^\MDP{6{R2,a),SiR2,cT),r)\ 



(3.5) 



The finite size effect is caused by the fact that numerical studies do not compute the 
true error E{r,R) but instead the difference E{r,R) — E{r,Rv), where Ry is the 
system's radius. Therefore the finite size effect can be calculated by computing the 
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ensemble average of MDP{5{Ri), S{R2),t) over some range of radii, extrapolating 
these results out to i? = oo, and then doing the integration over Bi and i?2- 
My numerical results for the density matrix p showed that: 

E{r, R) = MDP{A{R),A{R),r) w MDP{p, p, R) (3.6) 

. This result was essential to the correctness of the error estimate given by equa- 
tion]^] Moreover, I found that MDP{p, p,r) « 7r^''' cxp (^^^). Therefore the 
magnitude of A{R) is proportional to 7r^^cxp(^). Similarly, the magnitude of 
S{R) = ^A(i?) is also controlled by the exponential exp(^). Plugging this re- 
sult into the equation 13. 51 s bound immediately shows that the finite size effect is 
controlled by the exponential exp ( )■ When the 0{N) error is exponentially 
small, the finite size effect is also exponentially small. 

This analysis ignored one detail: in my calculations the 16^ system had repeat- 
ing boundary conditions. If the system were really part of a larger, infinite system - 
as I assumed for my analysis of the finite size effect - it would have fixed boundary 
conditions. However, this difference of boundary conditions is just a change in the 
matrix elements at the boundary of the 16^ system. My numerical results demon- 
strated that the density matrix's dependence on things far away is exponentially 
small; the effect of boundary conditions is also exponentially small. In any case this 
difficulty can be avoided by choosing fixed boundary conditions for system. 

3.7.2 Statistical Uncertainty 

I computed the standard deviation and kurtosis (related to the fourth moment) of 
the matrix dot products and the absolute and relative errors, and spent a lot of 
time graphing them and thinkng about them. The most interesting of these results 
is reported in figure ITTl That graph is the exception, however; in general each 
observable's standard deviation was small compared to the observable itself. There 
is also an interesting pattern in the kurtosis of certain observables, which becomes 
inconsistent with a gaussian distribution as the disorder grows, starting at large 
radii and large fermi levels \p\. In fact the statistical distribution of observables 
in disordered systems is not guaranteed to be gaussian, and may have a long tail. 
Therefore a full quantitative analysis of the statistical uncertainty will require use 
of a jackknife, bootstrap, or similar method. 

3.7.3 Errors from the Chebyshev Expansion 

The error induced by truncating a Chebyshev expansion to the first S terms is 
controlled by an exponential with argument proportional to —TS/A, where A is 
the size of the energy band [35]. In my calculations I used S ~ 25A and T — 0.05, 
giving this fraction a value of —1.25. I believe that this value reduced Chebyshev 
errors to an insignificant level. I base this belief on a set of earlier calculations 
where I used a number of terms S which was constant rather than proportional 
to the band width A. At small disorders - and thus small band widths - there 
was no difference between results using S = 500 terms and results obtained using 
S — 1000 terms. However, as the band width increased with disorder I began to see 
a difference, meaning that the Chebyshev truncation became the dominant source 
of error. Moreover, the error began increasing as the disorder (and band width) 
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were increased. Therefore I redid all my calculations with the conservative heuristic 
S = 25A. 

The Chebyshev expansion also introduces errors through finite precision arith- 
metic. I am unaware any published work analyzing the effects of finite precision 
arithmetic on the stability and accuracy of Chebyshev expansions, so I don't know 
how to estimate these errors. 

In section 13.11 I briefly mentioned that there are alternative basis truncation 
algorithms that can be used as alternatives to the Goedecker algorithm discussed 
here. These alternatives avoid the Chebyshev expansion and therefore their error 
may be easier to analyze. On the other hand, they introduce other sources of error, 
which may also be difficult to analyze. 

In any case, probably the best practical way of estimating the Chebyshev ex- 
pansion's error is simply to measure how the density matrix varies while increasing 
the number of terms. As I already mentioned, the density matrix should converge 
exponentially, so one can fit an exponential to the error. As long as the error due 
to truncating the expansion is large compared to the error due to finite precision 
arithmetic, an exponential fit should give a good estimate of the error. However, 
finite precision arithmetic will cause the expansion to converge to a (slightly) wrong 
result, and the exponential fit will not be any help in measuring this effect. 

Of course there are complications in doing the exponential fit. One should really 
fit every matrix element of the density matrix, and then perform some sort of average 
of the fitting parameters. Moreover, even though the error is guaranteed to decrease 
exponentially, the exponential decay may be multiplied by some oscillatory behavior, 
which can cause difficulties in the fitting. At present my program generates data 
about the how the absolute error varies with the number of terms in the Chebyshev 
expansion, but I have not yet found the time to implement and test the exponential 
fit or the averaging. 

3.7.4 Errors in the "Exact" 0{N'^) Algorithm 

I used the simple algorithm of diagonalizing the argument H = UDW and then 
evaluating the matrix function f{H) = Uf{D)W. It has been reported in the 
numerical analysis literature [50,51] that for general arguments H this algorithm 
is often not the best because errors in the eigenvalues can be multiplied by the 
square of the condition number of the transformation U. However, the Anderson 
Hamiltonian has transverse symmetry, so U is unitary, implying that the condition 
number is one. Therefore this simple algorithm is fairly well suited to my needs. 

I used the LAPACK [52] algorithm dsyevr to diagonalize the Hamiltonian, and 
checked its results by computing the eigenvector orthogonality relation = Sij 

and the eigenvalue equation {ipi\H\ipj) — EiSij. One thing I noticed was that 
for i ^ j the eigenvalue equation was violated by errors roughly proportional to 
{Ei — Ej)^^ . The proportionality constant was of course quite small, but if the 
eigenvalues were nearly degenerate then the violation could be important. This 
effect motivated a move from single precision arithmetic to double precision arith- 
metic. It is well known that in unlocalized disordered systems the eigenvalues repel 
each other and degeneracies are avoided, so that this effect is likely not a problem. 

One non-trivial check available for any matrix function calculation is the trace 
identity Tr{f{H)) = Tr{f{D)), which follows from the identity Tr{AB) = Tr{BA) 
I checked this identity for every density matrix I calculated and threw an error when- 
ever the relative error was more than 10"^. There was only one matrix that exceeded 
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that limit, with a relative error of approximately 3 x 10~ . 

In any case my numerical results clearly showed that the 0(N) results were 
converging to the 0{N^) results, which strongly indicates that errors within the 
0{N^) algorithm did not dominate the final results. 
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Chapter 4 



Accuracy of 0{N) 
Algorithms for the 
Logarithm, Exponential, 
Inverse, Cauchy 
Distribution, and Gaussian 
Distribution 

4.1 The Functions 

In the last chapter I examined the accuracy of 0{N) basis truncation algorithms 
for evaluating the density matrix in disordered systems, in this chapter I apply the 
same analysis to five other matrix functions: 

4.1.1 The Natural Logarithm 

This function is non-analytic at the origin and is imaginary if the argument is neg- 
ative. If the logarithm's argument is real then the logarithm itself can be written 
as: Inx = ln|a;| +iQ{—x). The reader will recognize that the theta function in 
the imaginary part is just the density matrix I have already examined in so much 
detail. Therefore I only calculate the real part of the logarithm, ln|x|. Furthermore 
I regularize the singularity by using the following formula: 

f{\x-iJ,\) = ln|a;-/z|,|a;-/i| >r 

= i[(^)'_l]+lnr,|x-M|<r (4.1) 

Just as with density matrix, the parameter fj, simply displaces the matrix function 
along the real axis. I will continue to call n the fermi level for lack of a more 
descriptive name. The temperature r again determines the energy resolution of the 
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matrix function. Just as with the density matrix, I choose r = 0.05 and vary the 
fermi level fi between —6 and 6 in steps of 2. 

Both this regularized logarithm and its derivative are continuous. Two factors 
forced me to regularize: First, the unregularized matrix logarithm could be extremely 
sensitive to eigenvalues which lie near fi. In computations with finite matrices, such 
eigenvalues become exceedly rare and cannot be studied using numerical techniques. 
Calculating the unregularized matrix function would not have given any information 
about the accuracy of 0{N) methods in calculating eigenvalues near /i. Second, the 
Goedecker algorithm used here is based on the Chebyshev expansion, which always 
regularizes singularities and discontinuities at a scale which is inversely related to 
the number of terms in the expansion. I chose to impose a regularization of my own 
choice rather than rely on the Goedecker algorithm for the regularization. This is a 
limitation of the Goedecker algorithm only; other basis truncation 0{N) algorithms 
do not impose any regularization and are able to calculate eigenvalues which are 
arbitrarily close to /j,. 

Note that if an 0{N) method can accurately calculate a regularized matrix 
function, then one can create a hybrid algorithm which accurately calculates the 
unregularized function. This hybrid algorithm would use a Lanczos-based approach 
to calculate the difference between the regularized and unregularized matrix func- 
tions. 



4.1.2 The Inverse 

The inverse function is the derivative of the real part of the logarithm. Like the 
logarithm, it is nonanalytic at the origin. Therefore I regularize the inverse function, 
setting it equal to the derivative of the regularized logarithm: 

f{x-n) = -^—,\X-IJ,\>T 

X — fl 

^ (^),k-/^l<T (4.2) 

I choose T = 0.05 and vary the fermi level between —6 and 6 in steps of 2. This is 
the real part of the inverse in the limit where the argument has only a very small 
imaginary part. The complex part of the inverse is just the Cauchy distribution, 
described next. 

4.1.3 The Gaussian and the Cauchy Distribution 

These two functions are interesting because they are regularizations of the Dirac 
delta function. I define them as follows: 

/(|a;-M|) = exp(-i(^)') (4.3) 

f{\x-^,\)^{{x-t,f+T'f' (4.4) 

As with the other functions, I varied the fermi level fx between —6 and 6 in 
steps of 2. However, I chose a bigger temperature r = 0.3. The reason for this 
choice is that in the case of the Gaussian and Cauchy Distribution r defines both 
the largest energy scale and the smallest energy scale, while for the other functions 
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r defines only the smallest energy scale and the largest energy scale is the band 
width. Therefore it seemed reasonable to choose a somewhat larger energy scale. 

All of the functions discussed so far are either symmetric or antisymmetric un- 
der the parity transformation x — /j. — > fx — x. This implies that the statistical 
distributions which I measure should be symmetric under fj, 

4.1.4 The Exponential 

I define the exponential: 

!{x) = e"^ (4.5) 

There is only one parameter a, a multiplier which controls the energy scale. I varied 
a from 0.1 to 3.1 in steps of 0.25, which corresponded to changing the scale of the 
exponential's variation between 10 and ^ 0.3. 



4.2 Results 

All of the results discussed here are obtained only for values of /i which lie inside 
the energy band. On the edges of the energy band, and outside it, fitting becomes 
difficult and the relative error decreases very rapidly (except in the case of the 
Gaussian where the relative error converges to one.) The graphs and numerical fits 
are all at /i = 0; the results are qualitatively the same at other values of /x inside 
the energy band. 

I computed all quantities at eleven different values of the disorder varying be- 
tween (7 = 0.65 and a = 9.0. At the lowest disorder value, a = 0.65, all quantities 
display a rapid variation with the position r. I call this effect, which is even more 
pronounced at zero disorder, the lattice effect. There is a transition as the disorder 
is increased to ct = 1.15 and a = 1.65: these rapid variations smooth out and 
the 0{N) algorithms obtain good accuracy for the density matrix. All observables 
which I calculate tend to maintain their qualitative features across all disorders 
above a = 1.65. In particular, the localization transition, which occurs at cr = 6.15, 
has little effect on the results. 

I remind the reader that in the previous chapter I defined the Partial Matrix Dot 
Product, a new metric for measuring matrices in localized systems: 

MDP{A, B,x) = J2 {y\A\y + £){y\B\y + x) 

y 

This metric allows one to compute both magnitudes of matrices and angles between 
matrices as a function of the displacement x from the matrix diagonal. In my 
computations I actually calculate the angular average MDP{A,B,r), which is also 
a dot product. 

I defined the error matrix A(i?) as the difference between the exact matrix func- 
tion / and the approximate matrix function f{R)- An 0{N) algorithm's accuracy 
is measured in terms of the relative error: 

MDPiAiR)MR^ (4 6) 

MDP{fJ,r) 



Therefore the matrix magnitude MDP{f, /, r) and the absolute error MDP{A{R), A{R), r) 
are also of interest. My numerical results showed that, in the case of the density 
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Figure 4.1: The normalized matrix magnitude at small disorder a — 1.15. 



matrix, the absolute error is roughly invariant throughout the 0{N) algorithm's 
truncation volume; i.e it doesn't depend on the coordinate r. Therefore the ques- 
tion of 0{N) accuracy reduced down to examining how the matrix magnitude 
MDP{f,f,r) falls off with increasing displacement r from the diagonal. 
The questions to be addressed for other matrix functions are: 

• What is the matrix magnitude's dependence on r? 

• When does the relative error become small? 

• Is the absolute error roughly invariant throughout the truncation volume? 

• Is the relative error of order one at the boundary of the truncation volume? 

If the answers to the third and fourth questions are yes, then the following 
formula is useful for estimating the relative error: 



e(r, R) 



< MDP{fJ,R) 
~ MDP{fJ,r) 



(4.7) 



4.2.1 The Matrix Magnitude 

Graph im shows the magnitudes of the various matrix functions at a low disorder 
a = 1.15. This graph is typical of results at small disorders. In particular, the 
a — 0.65 graph is the same, except that: 

• The lattice effect is much bigger, causing much bigger oscillations in the 
graphs. 

• The rise in the inverse function at large r, barely perceptible for a = 1.15, is 
much larger. 

• The Cauchy and logarithm results lie on top of each other. 

• The matrix magnitude is slightly bigger. 

I choose to print the <j = 1.15 results because the lattice effect is smaller and 
therefore the graph is prettier. Qualitatively the same trends remain for all small 
disorders: 
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Figure 4.2: The normalized matrix magnitude at intermediate disorder a — 
4.65. 

• The exponentials do not very decay quickly at small r, but as r increases they 
decrease precipitously. Smaller values of the constant a decay more quickly. 
The exponential's fast decrease is probably caused by the fact that the matrix 
exponential is dominated by eigenfunctions at the band edge, which are of 
course localized in the Anderson model. Possibly the trend to fast decay at 
small a could be explained by the fact that at a = the matrix exponential 
becomes the identity. 

• The inverse function decays the least, with only a factor of about ten between 
r — and r — 14. 

• The density matrix, gaussian, logarithm, and Cauchy distribution all decay 
more quickly at small r than at large r. Among these, the density matrix 
has the smallest matrix magnitude at small disorders, while the gaussian, 
logarithm, and Cauchy distribution have similar matrix magnitudes. As we will 
see throughout the numerical results, these four functions behave in generally 
the same way, except that the density matrix is better behaved at small 
disorders. 

The trends change a bit at larger disorders. I present here the results for a = 
4.65, which are representative of all but the smallest disorders: 

• All graphs are flattening out to become straight lines except at small r. This 
is a sign of the coherence length, which I will shortly discuss in more detail. 

• Again the inverse decays most slowly and the exponentials the most quickly. 

• The exponentials decrease extremely rapidly, losing nine orders of magnitude 
by r = 5. This immense decrease is accentuated more and more as the disor- 
der increases. At the same time the statistical distribution of the exponential's 
observables becomes very strange. On the graph shown here, the observed 
second moment of the matrix magnitude is of the same order as the matrix 
magnitude itself. Moreover the kurtosis is huge, signifying that the statistical 
distribution is no longer gaussian and has a large tail. 

• At small r the magnitudes of the gaussian and the Cauchy distribution are 
larger than those of the logarithm and the density matrix. However at large r 
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they catch up and the trend is reversed. The seeming correspondence between 
the gaussian and the Cauchy distribution is an artifact; as the disorder is 
increased the Gaussian becomes smaller and smaller compared to the Cauchy 
distribution. 

In order to understand these results better, I fit the matrix magnitudes to a 
simple formula modeling a coherence length: lnMDP{f,f,r) oc —rjlnr — 2rL~^, 
where the coherence length L is equal to = aa + p + j/a. Note that this 
expression for the coherence length is just a perturbative expansion in powers of 
the inverse disorder i and thus can be expected to fail at small disorder. In fact I 
was unable to obtain satisfactory fits for any function other than the inverse except 
by omitting small disorders [a < 2.0) from the fit. I experimented with a couple 
of other parameterizations of the matrix magnitude and found that the fit was not 
significantly improved, which leads me to conjecture that the matrix magnitude at 
low disorders is determined by non trival physics. By obtaining the cutoff at low a I 
was able to obtain a chi squared per data point typically around 0.30, but ranging as 
low as 0.07 (for the inverse at /x = 0) to as high as 0.5 (for the inverse at = 6.) 

The exponential function resisted any satisfactory fit. The problem appeared 
to be rapid oscillations with r, which can be seen superimposed on the overall 
exponential decrease. This suggests that the lattice effect is not damped in the 
exponential, but continues to play an important role at all disorders. 

In contrast, the inverse could be fit satisfactorily even when including small 
disorders. At ji — the parameter 7 which controls the - term comes out to a 
number very close to zero. I am not sure whether this really indicates that the 
inverse has relatively trivial small-disorder physics; in particular if I leave the lowest 
disorders out of the fit I get a somewhat better chi squared and 7 becomes non-zero. 
I report here the fit with all disorders included because it is extraordinary that one 
can obtain this fit. 

The following table of fitting parameters is meant to give only a qualitative 
picture of the results. The quantitative aspects, for instance an estimate of the 
uncertainty in the fitting parameters, are plagued by systematic problems, including 
both the necessity of leaving small disorders out of the fit and also effects caused 
by the finite lattice size. 
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This table offers two pieces of qualitative information: 

• The parameter a is almost the same for all of the fits. The parameter controls 

large disorder limit of the coherence length, where L^^ out. So all of the 
five fitted functions are controlled by the same coherence length, but differ in 
their small-disorder physics. 

• The power laws, indicated the parameter r\, are also very interesting. The lit- 
erature contains predictions that in three dimensions the inverse (or Green's 
function) should have a ^ behavior [53], and that the density matrix should 
have a ^ behavior [24]. These predictions are confirmed by the fits I report 
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Figure 4.3: The absolute error at small disorder a = 1.15, normalized by its 
value at r = 5. 
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Figure 4.4: The absolute error at intermediate disorder a = 4.65, normalized 
by its value at r = 5. 



here. However, I am not aware of similar theoretical predictions for the other 
matrix functions; developing such would be an interesting challenge for theo- 
rists, and perhaps a first step on the road to the more important problem of 
understanding the small-disorder physics. 

Note that I left r = out of all the fits, and imposed an integer constraint on 
the power ry. When I let i] vary continuously, its best fits were usually fractional 
powers. However the integer powers which I report here had chi squareds which 
were only ten percent worse. Much worse chi squareds could be obtained by by 
choosing another integer power. It is worth noting that the exponential fits, though 
unsatisfactory, showed a clear preference for = 4. 



4.2.2 The Absolute Error 

As I mentioned earlier, the important question about the absolute error is: does it 
remain fairly constant within the truncation volume? This condition is necessary for 
the validity of estimate l4.7l Graph l4~51 displays the absolute error at small disorder 
a = 1.15, normalized by its value at r = 5. In all cases except for the exponential, 
it is very close to constant. Graph 14.41 at a = 4.65, shows that as the disorder 



Chapter 4- Accuracy of 0{N) Algorithms for the Logarithm, Exponential, Inverse, 
38 Cauchy Distribution, and Gaussian Distribution 



increases the absolute error begins to vary a bit, and in particular increases sharply 
at r = 0. At this intermediate disorder the absolute error still varies by only an order 
of magnitude and can still be considered of order one. However, as the disorder 
crosses the Anderson transition the increase at r = gets bigger, so for the inverse 
it becomes a factor of 61 at a = 9.00. Moreover, in the case of the inverse function 
the magnification of the absolute error seems to start to extend to larger radii, as 
far as r = 2. Unfortunately these results do not allow an extrapolation to large 
truncation volumes, so it is impossible to say whether the variations in the absolute 
error increase as the truncation volume is increased. 

Clearly the behavior of the absolute error depends on which matrix function is 
being considered. In the case of the exponential it decreases precipitously, while for 
the other functions it is roughly of order one except at large disorders. However one 
can clearly distinguish functions which tend to decrease away from the truncation 
volume's boundary (the gaussian and the Cauchy distribution), the inverse function 
which tends to increase, and functions which stay even closer to constant (the 
density matrix and the logarithm.) 

I do not show the exponential in graph 14.41 because of rounding errors. My 
software calculates the absolute error MDP{f — f, f — f,r) via the identity: 

MDPif -fj- /, r) = MDPif, f, r) + MDP{f, f, r) - 2xMDP{f, /, r) 

(4.8) 

This formula works well for most cases and saved the work that would be required 
for implementing and testing subtraction and then rerunning all the previous cal- 
culations. The exponential, however, varies over many many orders of magnitude, 
and therefore rounding errors ruined the absolute and relative errors except within 
certain ranges of the disorder, the radius, and the parameter a. This was one ex- 
ample of the often very painful feature triage process that is required in writing and 
testing reliable software. 

4.2.3 The Relative Error 

The question of greatest interest is: when does the relative error become small? 
F i Ef u re l4~5l shows the relative error at three disorders: <j = 1.15,2.65,4.65. I don't 
show higher disorders because the results don't change qualitatively. Concentrating 
on the intermediate four matrix functions (the gaussian, logarithm, density matrix, 
and Cauchy distribution), one immediately notes that their relative errors are very 
similar, although a slight splitting appears as the disorder increases. The exception 
to this rule is when both the radius and the disorder are small, in which case the 
density matrix is considerably smaller than the gaussian and the Cauchy distribution, 
and the logarithm lies somewhere in between. When does the relative error of these 
four functions become small? Clearly at a = 1.15 the r = relative error is already 
of order 10^^ or better, while by a = 2.65 the error is 10~^ at radii up to about 
2. Moreover, even at cr = 1.15 the error shows an exponential dependence on r 
caused by the coherence length, suggesting that it could be made arbitrarily small by 
increasing the radius of the truncation volume. As the disorder increases, the slope 
of the error also increases, suggesting that smaller and smaller truncation volumes 
are necessary to obtain a given error. 

Turning to the exponential, its relative error is already very small at a = 1.15 
and diminishes very quickly with the disorder. This function is extremely well suited 
to 0{N) algorithms. 
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Figure 4.5: The relative error at small disorder a — 1.15 and intermediate 
disorders a = 2.65 amd a = 4.65. 
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The inverse is the opposite case: the lowest value of its relative error is 0.05, 
which is reached at a ~ 9.0, r = 0. Its maximum variation with r is a factor of 
34, again at a — 9.0. This leaves little hope that the error could be decreased 
substantially by expanding the truncation volume. Therefore, basis truncation 0{N) 
algorithms seem to be unsuitable for calculating the real part of the matrix inverse, 
at least when the inverse's argument is has only an infinitesimally small imaginary 
part. 

This conclusion must be qualified by three points: 

• My results do not close the door on 0{N) algorithms which do not use basis 
truncation. 

• The imaginary part of the inverse is the Cauchy distribution, which is tractable 
for basis truncation algorithms. The imaginary part of the inverse is the only 
thing that matters when evaluating a matrix function via the complex in- 
tegration formula f{X) = {^my'^ § dE{E - Hy'^ f{E). Therefore, basis 
truncation algorithms can calculate the inverse as an intermediate step to- 
wards calculating other functions. The real part of the intermediate results 
will be worthless, but that doesn't matter. 

• When the inverse's argument has a finite imaginary part, basis truncation 
algorithms may still be applicable. Smirnov and Johnson [3,4] did a detailed 
study of the accuracy of a basis truncation algorithm when calculating the 
resolvent {E — H) ^ in an ordered system. They showed that when the 
energy E has a finite imaginary part the error falls off exponentially with r. 

The last question of interest is whether the relative error is of order one at the 
boundary of the truncation volume. For all the functions except the exponential, 
the answer is an unambiguous yes. In fact, as the disorder increases the ?- = 5 
values converge to one, so that at cr = 9.0 the maximum disagreement from one is 
exhibited by the inverse with a value of 1.7. The exponential is, perhaps, another 
matter: its r = 5 value gets progressively smaller, until at ct = 9.0 the a — 1.6 
exponential has a value of 3 x 10^^. However maybe even this can be considered 
to be order one in light of the exponential's extremely fast variation with r. 

Finally I discuss the validity of the error estimate in equation 14.71 which rests 
on both the order one question discussed in the previous paragraph and on the 
uniformity of the absolute error throughout the truncation volume. All the functions 
examined here (with the possible exception of the exponential) justify the first 
assumption. However, as discussed earlier, the spatial uniformity is more dubious, 
especially for the inverse and for large disorders. The gaussian, logarithm, density 
matrix, and Cauchy distribution are more or less invariant within the truncation 
volume except at very high disorder. On the other hand, the inverse function is 
elevated at small radii, and this is important in making the inverse's relative error 
so large. Lastly, the matrix exponential varies by many orders of magnitude within 
the truncation volume. Eauation l4. 71 could be used nonetheless as an upper bound 
for the exponential's relative error. 

4.2.4 Software Reliability and Reproducibility 

The software used to obtain these results is similar to that used to obtain the re- 
sults in chapter |31 and should be able to reproduce exactly that chapter's results, 
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although I haven't checked. Only a small percentage of the total number of lines 
of code was changed. This software includes an automated test suite which tests 
all computational functions except the highest level output (printing and graphing) 
routines. Moreover, I have taken pains to enable other researchers to easily re- 
produce and check my results, simply by installing my software and the libraries it 
depends on, compiling it with the GNU gcc compiler [49], and starting it running. 
The software, with the configuration files needed to reproduce all the numerical 
results and graphs presented in this paper, will be made available under the GNU 
Public License [7]; check www.sacksteder.com for further details. 

For the reader's benefit it is important to discuss the biggest new risk, asso- 
ciated with the exponential function. I have already mentioned that some of the 
(unpublished) results for that function are incorrect; in particular sometimes the par- 
tial matrix magnitude of the absolute error, which is positive definite, is computed 
to be negative. Moreover, there is a consistent trend for the standard deviation 
of observables associated with the exponential to be of the same order as the ob- 
servables themselves. This second effect, while disturbing, may accurately reflect 
the true physics. I believe that any unreliability in the computations is caused by 
rounding errors related to the exponential's huge changes in magnitude. I have 
verified that the algorithm in use is in fact vulnerable to rounding errors when oper- 
ating on numbers with widely varying magnitudes, have checked that the negative 
magnitudes do not occur when the parameters are such that the exponential varies 
less, and have avoided publishing any data about the absolute error except in the 
cases where the exponential's variation is most restricted. I also note that exactly 
the same code was used to produce analogous results for other matrix functions 
which do not exhibit such large changes in magnitude, and that the results for these 
functions are free from any negative magnitudes. There is still some risk that the 
negative magnitudes are due to some other problem or bug, and that the results are 
more thoroughly wrong than I imagine. I believe that the chances of this being true 
are sufficiently small to allow publication in good conscience of the results about 
the exponential's absolute error. 
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Chapter 5 



A Phenomenology of 
Eigenfunctions in 
Disordered Systems 

In disordered systems, eigenfunctions and other physical quantities generally display 
a finite coherence length, and possibly also a localization length [39,45,54]. The 
coherence length often has a value much different than the localization length, and 
both lengths can be physically important. Several theoretical approaches have been 
developed to calculate observables in these systems, including the coherent potential 
approximation [45,54], the replica sigma model [11,55], and the supersymmetric 
sigma model [56]. 

When studying a disordered system with hamiltonian H, a practical problem 
of real importance is how to efficiently evaluate matrix functions f{H). For in- 
stance, quantum calculations of condensed matter systems are typically limited by 
the computational cost of calculating the density matrix function p. Recently a 
class of approximate but speedy algorithms has been introduced for evaluating ma- 
trix functions in an amount of time which is proportional to the basis size [2]. One 
subclass of these 0{N) algorithms is the basis truncation class, which truncates 
all the basis elements outside of a truncation volume and then does the function 
evaluation within the truncation volume. 

In chapter|3l tried to estimate the size of the absolute error A incurred by basis 
truncation algorithms, and derived equation 13.41 which gives an exact expression 
for the error in terms the eigenfunctions and eigenvalues of the truncated and 
untruncated systems. I repeat it here: 

(f|A(i?)|f +r) = 

g{Ea, Eb, Ec) = 
X 

Hi is the part of the hamiltonian which connects the truncation volume with 
the rest of the system, \a) and \b) are eigenstates of the truncated system, |c) is 
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{x\a){a\Hi\c){c\Hi\b){b\x + r)g{Ea,Eb,E,) 
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an eigenstate of the untruncated system, n and ha are the level densities of the 
untruncated and truncated systems, and 5 is a function describing the eigenvalues. 

This exact formula Is useless without phenomenologlcal estimates for the fol- 
lowing quantities: 

• The level densities n and riA- These quantities can be estimated from fits to 
numerical studies, theoretical estimates using the coherent potential approxi- 
mation, or convenient models. 

• The correlations between the eigenvalues Ea,Eb,Ec. Eigenvalue correlations 
have been studied in detail in the unlocalized regime, and precise predictions 
are available when the conductance g is large. It is well known that in localized 
systems the eigenvalue correlation Is nlll unless two states are close together. 
However, the eigenvalue correlations between an untruncated system and its 
truncated version have not been studied so much, except on the level of 
perturbation theory. Perturbation theory is of course Invalid when there Is 
strong mixing of states. 

• The matrix elements {x\a), {a\Hi\c), {c\Hi\b), and {b\x + r). One also needs 
their correlations with each other and with the eigenvalues. 

The phenomenology of these quantities must apply to systems where the coherence 
length is smaller than the truncation volume, because this is the regime where 0{N) 
basis truncation algorithms work best [38]. It should apply both to systems which 
are unlocalized and systems which are localized. 

Of the three items listed above, the matrix elements may be the most challeng- 
ing, because one expects a complete change as the disorder becomes large. There 
is a well developed theory of eigenfunction amplitudes and correlations in the limit 
of infinite conductance, and some work has been done to extend these results to 
the regime of large but not infinite conductance. However, I am not aware of any 
phenomenology for estimating matrix elements in either the localized regime or the 
regime which is unlocalized but still has a small conductance. In this chapter I 
propose a phenomenologlcal model of wavef unctions and matrix elements which In- 
corporates only information about the coherence length and the decay length, and 
leaves out all the details of the disorder. This model facilitates some calculations 
which would otherwise be Impracticable, and can also provide qualitative physical 
Insight about effects of length scales on physical quantities. 

This phenomenologlcal model is designed to allow calculation of expectation 
values of observables. I start by constructing an ensemble of random functions 
designed to have a specific coherence length and decay length. With this random 
ensemble in hand, I compose the desired observable, and then average to obtain the 
expectation value. The result of the average Is naturally expressed In the momentum 
basis; evaluation of results in the position basis requires evaluation of integrals which 
are mathematically similar to loop integrals in quantum field theory. 

I begin this chapter by explaining the random ensemble and the averaging pro- 
cedure. Next comes an explanation of how to compute the participation ratios and 
a demonstration that this model predicts that the eigenfunctions are multifractal. 
Then there is a check of the model's description of localization and a simple example 
of calculating the value of an observable. I end with brief discussions about the pos- 
sibility of modeling correlations between eigenfunctions and about the relationship 
between this phenomenologlcal model and the supersym metric sIgma model. 
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5.1 The Model 

I start by incorporating into my model the correlation function, from which the 
correlation length can be derived: 

C{x) = j dk"\tl>{k)fexp{ik-x) = j dy^il>{y}^*{x + y) (5.2) 

This function completely determines the magnitudes of the amplitudes ^(A:). 
An eigenfunction centered at a position o must include a phase factor exp {ik ■ o)\ I 
consider o as a random variable and average over it. In addition, some other phase 
factor (^(fc) may be present. Thus I have: 

V'(fc) = (3{k) exp {-ik ■ o) exp (i^(fc)) (5.3) 



(3{k) is the normalized square root of the correlation function /?(fc) = {■^)'^ y \C{k)\. 
I require that the wave functions be normalized by setting 1 = C{x = 0) = 

I next insert information about the localization of the spatial wave function (j){x) 
by choosing a rule for averaging the phase over the disorder: 

(exp (i(/)(fc)) exp (-i0(fc'))) = f{k - k') (5.4) 

I will call f{k — k') the phase correlation function. Obviously /(O) = 1. I de- 
compose products of even numbers of phase factors into pairs of phase factors (a 
la Wick's theorem), and set averages of odd numbers of phase factors to zero. The 
above rules apply only to phase factors of different components il]{k) of the same 
eigenfunction V'; for the moment I assume that phase factors of two different eigen- 
functions are uncorrelated. Each eigenfunction will also have its own translational 
phase exp {ik ■ o) which must be averaged over. 

These rules complete the definition of the phenomenological model presented in 
this chapter. They are equivalent to a generating function: 

/(J, J*) = ^ j do'^e^ 

f vD ^ r ^ r 
K = dk dk J{k)J*{k)j3{k)(3{k) 

X f {k - k)exp{-io ■ {k - k)) (5.5) 

As usual, correlation functions are obtained by taking derivatives with respect 
to the sources J, J*, and then setting the sources equal to zero. V is the system 
volume, and both J and carry units of \/V . 

Berry conjectured that eigenfunctions in a quantized chaotic system are sums 
of plane waves, each with a random phase [5]. This corresponds to choosing the 
phase correlation function / to be a delta function, in which case the kernel reduces 
to: 

K=^ j df\J{k)\^(3\k) (5.6) 

In this case, the phenomenological model presented here corresponds exactly 
with Srednicki's mathematical formulation of Berry's conjecture [57], and can be 
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used to calculate any observable. As shown by Prigodin [58], Prigodin et al. [59], 
and Srednicki [57], calculations using the supersymmetric sigma model result in the 
same values of observables if the system being computed is not spatially extended 
or, equivalently, has an infinite conductance g. Several authors have used the 
supersymmetric sigma model to compute corrections to the values obtained via 
Berry's conjecture in powers of the inverse conductance ^ [60-63]. Their results 
are necessarily applicable only to the regime of large g. I will discuss the relation 
between their results and this phenomenological model at opportune moments. 
As an exercise, I compute the average two point correlation function. 



{^ik)rik')) = (^) ' v^|C(fc)C(fc')l(exp (lik-k') ■ S) exp {i{cp{k) ~ cp{k')))) 

(5.7) 

The average over o results in a delta function divided by the system vol- 
ume V, enforcing momentum conservation. Thus one obtains {ip{k)ip* {k')) — 

F~-'^(27r)^ S^{k — k'). Conversion of this result to position space requires 

two integrals over momentum; however momentum conservation gets rid of one of 
the integrals. Inspection of eauation l5.2l shows that the Fourier transform C(fc) of 
the coherence function is real; 

{^P{x)r{x'))^^^^y^ (5.8) 

Eauation l5.8l is a consequence of the assumption that each individual wave func- 
tion in the ensemble shares the same correlation function C{x). This assumption 
is probably the weakest point of the model presented in this paper. In general 
one would expect it to be wrong; that each individual eigenfunction would have a 
distinct shape and correlation function. 



5.2 The Participation Ratios and Multifractal- 
ity 

Next I calculate the averages of the participation ratios: 

(p") ^ (y'df^(^(x)v/(x))") 



h{o) = j dlf dk l3{k)l3{k)f{k-k)exp{-io- {k-l)) (5.9) 

In the case of Berry's conjecture, h = and P" = This result 

can be rewritten in terms of the probability distribution Pdi/ij^) of eigenfunction 
magnitudes: P(|-(/'|'^) ~ V exp {—Vltpl^). Setting V = 1 reduces this to a famous 
result from random matrix theory; the result with V ^ 1 can be derived using 
supersymmetry in the limit g = oo. 

The participation ratios P" are closely related to the singularity strength of frac- 
tals and multifractals; in fact fractal can be characterized by how its P" scale with 
the system volume V [64-67]. A function with fractal dimension D* which lives in 
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a world with dimension D will have participation ratios whose volume dependence 
is given by P" cx ^(i-")^'*/^^ j^is relation can be used to define the fractal di- 
mension. In contrast, the multifractal has a fractal dimension for each participation 
ratio; D* is a function of n. 

Theoretical arguments and numerical calculations indicate that wave functions 
near the localization transition are multifractals [6,68]. The model presented in this 
paper clearly allows wave functions at criticality to be modeled by simply choosing 
functions /? and / which result in a multifractal h. For instance, if / is taken equal 
to 1, implying no localization, then f3 would have to be chosen to be a multifractal. 
Note that this model only allows calculation of average participation ratios; however 
I believe that if the average ratios indicate multi-fractality then the individual wave 
functions should also be multi-fractal. 

It would be nice to find a choice of (3 and / which reproduces supersymmetric 
predictions for the regime of large conductance g. Fyodorov and Mirlin [60] ob- 
tained the participation ratios P" — n\V^^"{l + ^■n{^^ ~ 1)), which corresponds 
to the probability distribution P{y = V\ip\'^) = exp (-y)(l + 11(1 - 2y + y^/2)). 
The difFuson loop IT is a geometry dependent constant determined by the spectrum 
of the system's diffusion operator V^; in two dimensions it is related to the volume 
V via the equation 11 = 2^^t^^- This result assumes that V^Ii/jI" is small and 
therefore is valid only for participation ratios P" which have small n. Working in 
two dimensions, Faiko and Efetov [63] were able to go a step further by using an 
instanton technique; i.e. they chose a saddle point which was not translationally in- 
variant. Using this improved prescription, they were able to calculate the probability 
of large amplitudes [69]: 

Piy^V\^\') = cxp(-^ln^(n2;)),i<2; 

= exp(-y + nyV2),-^<y< 1 (5.10) 



The most intriguing aspect of these results is their prediction that the eigen- 
functions are weakly multifractal. The difFuson loop 11 gives Fyodorov and Mirlin's 
results in an extra volume dependence which translates to multifractality. One can 
see this from the fact [69] that at large g their participation ratio is equal to: 

Therefore the participation ratios derived by Fyodorov and Mirlin scale with the 
volume as y("^i)(i+"/47r3) nonlinear exponent signifies multifractality. FaIko 

and Efetov's results are also multifractal. 

Interestingly enough, the model presented here can easily reproduce the structure 
of Fyodorov and Mirlin's results and therefore describe a multifractal system, with 
the following choice of the phase correlation function /: 

f{k) cx V-^S^ik) + aiWd^{k) + a-i'^^S^ik) (5.12) 

The as are just small (perturbative) constants, with 02 oc of. Then, to second 
order in o, the function h defined in equation 15.91 is given by: 

h{o) = 1 + as ■ o + -o- 0,4 ■ d (5.13) 
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(23 is the first derivative, while 04 is the second derivative, which \s a D x D tensor. 
Again keeping only lowest order terms, one obtains the following value for the n-th 
power of h: 

^ n ^ ^ n(n—l),^ ^9 
1 + nag • o + -o • 04 • o + — — ^ ^(03 • o) + ... (5.14) 

Equation 15.91 requires integration over o. During this integration, the second term 
integrates to zero and the second derivative in the third term is replaced by its trace 
Tr{d4). If the trace is zero then the lowest order term is of order 0{n{n — 1)) 
and equation 15.91 gives P" = n! y^~"(l + 05 "("^"^^ )_ which is just Fyodorov and 
Mirlin's multifractal result. It is very interesting that such a simple model can 
reproduce the results of a very complicated supersymmetric calculation. 

The odd thing though is that this perturbative calculation predicts multifractality 
for any perturbation to Berry's conjecture, in any dimension. Usually multifractality 
is expected only close to the Anderson transition. Fyodorov and Mirlin's prediction 
of multifractality was expected only because they derived their result in two dimen- 
sions. The critical dimension for the Anderson transition is two dimensions and 
therefore every eigenfunction is expected to be multifractal. In other dimensions 
multifractality is not generally expected. Perhaps the reason for the result here 
is that it is based on Berry's conjecture, which assumes that the system is fully 
chaotic. 



5.3 The Localization Length 

The phenomenological model presented here contains two parameters: (3{k) which 
is related to the correlation function C{x), and the phase correlation function /(fc). 
I have proved that (3 determines the model's correlation function, and clearly if 
/3 decays with a momentum scale R^^ then the wave functions have correlation 
length R. On the other hand, what determines the localization length L of local- 
ized wave functions? It turns out that L is equal to the maximum of two length 
scales: the correlation length R, and the length p which governs the phase cor- 
relation function /. To prove this assertion, I calculate the four point correlator 

F(A) = (|'0(z)| |V'(a;-l-A)| ). Its long distance behavior will allow me determine 
the model's localization length. 

Wick's theorem breaks F into the sum of three terms, and some algebraic 
manipulation results in the following identity: 

^ D 

^^(A) =yji^) f{k)[exp{ik ■ A)/(0, k) + 2/(A, k)] (5.15) 
The function / is real and is defined as: 

I{S,k) = [ dk exp{i6 ■k)[3{k- -k)l3{k + -k)f (5.16) 

/ completely encapsulates the influence of the spatial correlation function C{x) on 
the four-point function F, and /((5, 0) — C^((5). The second term in equation l5.15l 
describes the two Wick terms which connect x and x -I- A, and clearly its long 
distance behavior is always governed by the correlation length R. In contrast, the 



5.3. The Localization Length 



49 



decay of the first term in equation 15.151 is influenced by the phase correlation / as 
well. To be more concrete, I consider two different choices of f3 and /: gaussian 
decay, and exponential decay. 

5.3.1 Gaussian Decay 

I choose f3{k) = (^) exp{-k'^ R'^ / 4) and f{k) = exp[-k'^ / 4). Integrating 
over the gaussians just produces more gaussians; the final result is: 

F(A) = V-\2n\^)-'"\xp{^) + 2exp(-AVi?2)] (5.17) 

The length scale A is defined by = + ^R^ ■ Note that the localization length 
L is never smaller than the correlation length R, even if the phase decay length p 
is much smaller. In the opposite situation where p ^ R, the four point function 
resolves into two terms, the first decaying slowly with scale p and the second more 
quickly with scale R. As the phase decay length p goes to infinity, the first term 
becomes a constant, signalling that there is no localization. Thus I have L = 
max{p, R). 

5.3.2 Exponential Decay 

I work in three dimensions, assume that the phase correlation function f(k) can be 
described in terms of its poles, and choose a coherence function C{x) proportional 
to exp{—x/R). The corresponding f3 is: 

TT 

By introducing a Feynman parameter I reduce the three integrals in the definition 
of I{S, k) down to a single integral: 

Jl{lk)^6 f dx{l-x)xexp{~6/R + t{x-l)S-k){n/Rf{l + {S/fi) + hs/pf} 
Jo 2 6 

(5.19) 

The new length scale p \s p? = [R^^ + k^x{\ — x)) < R^. Turning to equa- 
tion for the four point correlator, it is obvious that the second term is dom- 
inated by an exponential which decays at least as fast as exp{—2A/R). The 
first term is a little more tricky: one must do the integral over k before the in- 
tegrals over the Feynman parameters. After angular integration, the k integral 
looks like J^^dkexp{ikA)p{k){p/Rf{p/Rf. Now assume that the phase cor- 
relation function f{k) has the same structure as in eauation l5.18l but with its poles 
at ±ip^^. Then the integrand in the k integral contains poles at fc = ±ip^^, at 
k — ±i{RyJ x{\ — x)) , and at fc = -izi^R^J x{l — £)) . It also has a branch 
cut caused by the non-analyticity of the square roots, but this can be hidden in 
the lower half plane. The poles are multiplied by an exponential exp{ikA), which 
causes decays lengths of p, R^J x{l — x), and R^J x(l — x). Thus I obtain the same 
qualitative behavior as with gaussian functions; in particular the localization length 
L is given by L = max{p,R). 
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5.3.3 Correspondence with Supersymmetric Results 

Blantner and Mirlin [61] used the supersymmetric sigma model to derive the fol- 
lowing result: 

v'^F{K) = i + (x) (1 + n) + n(A) (5.20) 

n(i;) is the diffusion propagator; i.e. the inverse of V^, omitting the zero-momentum 
mode, n = n(0) is the diffuson loop that I have already discussed. Prigodin and 
Altshuler [62] used a simplified model (a Liouville model) to derive the joint probabil- 
ity distribution of wavefunction magnitudes at two different points P{\il){x)\^, \'>p{y)^). 

This probability distribution is equivalent to knowing the moments {\ip{x)\^^\tp{x -l- A)| ) 
for all n, m. 

As I already mentioned, the model presented here is a generalization of Berry's 

model, and therefore reproduces these results in the limit of infinite conductivity, 
where the diffusion propagator is zero and the phase correlation function / is a delta 
function. Clearly one would hope to reproduce Blantner and Mirlin's result using 
f{k) = V^^8^{k) +af{k), where a is a small parameter. However I haven't yet 
checked whether this is actually possible. 

5.4 Matrix Elements 

The original motivation of this work was to estimate certain matrix elements in a 
disordered system, given only information about the coherence length and localiza- 
tion length. As a simple example, I consider the case where the matrix element is 
a surface of area A described by the equation s{x) = 0. Then the matrix element 
between two wave functions V'l and ^2 is M = / rff^^(s(f))i/'i(^)V'2(^)- Since ipi 
and -02 must be averaged separately, the average matrix element is trivially zero. 
However, the normalized average of the squared matrix element does not vanish: 

{V'A-'){\Mf) = {V'A-'){ j dx"dy"6{5{xm5{miixm{x)ri{y)My)) 

(5.21) 

This quickly converts to: 

A-^J dx''dy''6{5{x))6{s{y))\C{x-yt (5-22) 

If the surface s has a curvature which is small compared to the correlation length R, 
then {V'^A-'^){\M\'^) will be proportional to At^E?A~^. This is the same scaling law 
that one obtains heuristically by arguing that the matrix element may be modeled 
as a sum over incoherent volumes of radius R. It is quite remarkable that the phase 
coherence has no effect whatsoever on the second moment of this matrix element. 

5.5 Final Remarks 

5.5.1 Correlations between Different Eigenfunctions 

So far I have ignored correlations between eigenfunctions and correlations between 
eigenfunctions and their energies. Supersymmetric results are again available in the 
limit of large conductance [61]: 

V^{\^l^{x, E)| A, E + uj)\^) = l + C2(f )n, uj<E^ (5.23) 
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The same paper gives a more complicated formula for the case of the energy sepa- 
ration u being large compared to the Thouless energy Ec- These formulas predict 
that the correlation between eigenfunctions which is quite weak; this prediction is a 
consequence of the assumption that the disorder is small. As the disorder becomes 
large, one should expect eigenfunctions to be strongly correlated around individual 
peaks and valleys of the disordered potential. In the localized regime, the orthog- 
onality of eigenfunctions implies that eigenfunctions that are close together will be 
strongly correlated. Eigenfunctions that are far apart will be entirely uncorrelated. 

The phenomenological model presented here can be extended to include these 
correlations by allowing the phase of different eigenfunctions to be correlated. One 
would introduce a phase correlation function between two different eigenfunctions 
i and j: 

(fc - k') = (exp {i4>(k)) exp (-i0(fc')) (5.24) 

In the localized regime the functions would have to be zero when the eigen- 
functions are far apart; i.e. when \oi — dj\ is large compared to the localization 
length. 

5.5.2 Relation to Supersymmetric Results 

At several points in this chapter I have cited supersymmetric results and discussed 
whether the phenomenological model presented here can reproduce those results. 
However the validity of this phenomenological model cannot be judged by its abil- 
ity to reproduce supersymmetric results. This model's purpose is specifically to 
understand the physics of systems where the coherence length is not that large, 
even systems that are localized. The diffusive supersymmetric sigma model which 
was used to produce the cited results is not applicable to such systems. The main 
strength of the phenomenological model presented here is that it offers the hope 
of describing those systems. Its success or failure in reproducing supersymmetric 
results is more of a curiosity. 
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Chapter 6 



Supersymmetric Results 
Without Graded Matrices 

6.1 Introduction 

In this chapter I carefully derive a new sigma model which is suitable for calculations 
in random matrix theory and in mesoscopic physics. The new sigma model is a 
very important contribution to mesoscopic theory because it greatly simplifies the 
current state of the art in mesoscopic theory (the supersymmetric sigma model) 
while retaining the same reliability in nonperturbative calculations. As an exercise, 
I use the new sigma model to derive some basic results in random matrix theory. 

In addition to containing novel results, this chapter can be used as a tutorial. 
It gives an unusually thorough derivation of a sigma model, and offers the most 
detailed and thorough explanation in the literature of how to use a field theory 
to do reliable non-perturbative calculations of the gaussian unitary ensemble of 
random matrices. I do not presume that the reader already understands random 
matrix theory, mesoscopic physics, or the supersymmetric sigma model, but instead 
derive everything from first principles. I do presume a knowledge of linear algebra, 
calculus, and field theory. 

6.1.1 Random Matrix Theory 

Random matrix theory is the study of matrices containing random numbers. One 
starts by defining the set of possible matrices. Then one defines the probability that 
each matrix will occur, a different probability for each matrix. These two pieces of 
information, taken together, define an ensemble. For instance, consider the set of 
N X N hermitian matrices. Define the probability of a particular matrix H occuring 
to be proportional to the gaussian: 

P{H)=expi-^TriH^)) (6.1) 

The choice of Hermitian matrices, in combination with equation 16.11 defines the 
gaussian unitary ensemble. Technically speaking, the size N is part of the ensemble 
definition, but usually this is not specified because one expects that calculations 
won't depend much on N. Other ensembles can be defined easily; for instance 
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the gaussian orthogonal ensemble has the same probability measure as the gaussian 
unitary ensemble but differs by requiring that the matrices be real and symmetric. 
This reduces the number of degrees of freedom in the matrices H and changes the 
results of most calculations. In this chapter I will be treating hermitian matrices, 
but the same techniques explained here can be applied to other ensembles. 

Given a particular matrix H, one can compute all sorts of things about it: its 
eigenvalues and eigenvectors, as well as any number of matrix functions f{H). I 
will call these quantities observables, denote them with the symbol D, and denote 
their possible values with the symbol o. Given an ensemble of possible matrices, a 
particular observable D may have any number of values o, and its actual value will 
depend in a complicated way on which member of the ensemble has been (randomly) 
chosen. It can be very interesting to calculate the probability that observable D will 
have a certain value o. The average value of D can also be interesting. Random 
matrix theory concerns itself with probabilities and average values of observables. 
For example, one of the most important observables is the two point correlator, 
which describes whether eigenvalues like to cluster close to each other or instead 
stay far apart. This quantity is one of random matrix theory's most famous results, 
and will be calculated in meticulous detail later in this chapter. The significance of 
this calculation is not that the result is novel - the two point correlator has been 
calculated in other ways - but instead that the means by which it is obtained is 
novel and unusually simple. 

Random matrix theory is useful for describing physical problems that are ex- 
tremely complicated. It offers a way of calculating results while ignoring the details 
of the problem; the unspecified details are represented by the random numbers in 
the random matrices. In other words, random matrix theory is is a way of mathe- 
matically representing one's knowledge that complicated things are going on while 
at the same time not specifying what those complicated things are. It has proven 
to give accurate predictions of certain quantities in nuclear physics and chaos [70]. 

6.1.2 Mesoscopic Physics 

Random matrix theory has become one of the most important theoretical techniques 
in the field of mesoscopic physics. Mesoscopic physics is a branch of condensed 
matter physics which is devoted to studying systems at a length scale from a few 
atomic spacings to a few hundreds of atomic spacings. Typically it is most concerned 
with the electrons, and treats the atoms as a medium through which the electrons 
move. In terms of quantum mechanics, this means that the atoms are represented as 
a more or less unchanging potential, while the electrons are represented as dynamic 
variables moving in that potential. 

Now usually the atoms are not under precise human control: one does not know 
or control precisely how many of which types of atoms there are, or exactly where 
they are. There may be 90% ± 2% silicon atoms, mixed with 10% other atoms, 
and one may not know anything about which atom is where. This situation is 
called "disorder." Despite not knowing this important information, one needs to 
make predictions about the electronic behavior. One way of handling the problem 
of disorder is a statistical approach: one enumerates the set of all possible ways 
the atoms could be arranged, including not only all the possible positions of the 
atoms but also the fact that there could be different numbers of each type of atom. 
One then defines the probability of each atomic arrangement. Then to calculate 
electronic behavior one calculates the average behavior, averaged over all possible 
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atomic arrangements. 

Quantum mechanics uses matrices to describe electronic motion. There is a 
matrix, called the Hamiltonian, whose numbers represent the potential that the 
electons are moving in. One makes predictions about how the electrons will move 
by computing various observables D of the Hamiltonian, just as I described earlier 
in relation to random matrices. In disordered systems one doesn't know some of the 
numbers in the Hamiltonian, so instead one defines an ensemble of possible Hamil- 
tonians and then averages the observables associated with electronic motion over 
this ensemble. In other words, the quantum mechanical theory of electronic motion 
in disordered media is a type of random matrix theory. In systems that are extended 
in one or more dimensions, the random matrices - random Hamiltonia - have a very 
special structure reflecting how electrons move from place to place. Only in sys- 
tems that look like a dot - a single point in space - do the random Hamiltonia have 
the gaussian random matrix structure which I showed earlier. The term "random 
matrix theory" is generally taken to refer only to the simpler case, often called a 
zero-dimensional system. Nevertheless, properly speaking the quantum mechanical 
theory of extended systems is also random matrix theory. 

The prototypical quantum mechanical model of a disordered system was the 
Anderson model [39]. It uses a basis laid out on a cubic lattice, one basis element 
per lattice site. (I mean "cubic lattice" in a sense which applies to any number D 
of dimensions; i.e. a lattice which is a uniformly spaced along each orthonormal 
direction in coordinate space. In two dimensions it would be a square lattice, et 
cetera.) The Anderson model specifies an ensemble of random Hamiltonian matrices 
which are symmetric and composed of two parts: 

• A regular part: {x\H\y) = 1 if x and y are nearest neighbors on the lattice. 
This term is, up to a constant, just the second order discretization of the 
Laplacian; its spectrum consists of a single band of energies between —2D 
and 2D, where D is the spatial dimensionality of the lattice. 

• A disordered part: Diagonal elements < x\H\x > have random values chosen 
according to some probability distribution. One must select a probability 
distribution; a popular choice is the gaussian distribution: 

These rules completely specify the ensemble of random matrices which is called the 
Anderson model. This is the third random matrix ensemble which I have described 
so far. 



6.1.3 The Supersymmetric Sigma Model 

Although random matrix ensembles are very easy to specify, it can be very chal- 
lenging to compute averages or probabilities of observables in these ensembles. A 
number of involved mathematical techniques have been developed for such cal- 
culations, including several field theories of the sigma model type. The two most 
popular sigma models are Efetov's supersymmetric sigma model [56] and the replica 
sigma model [55]. The supersymmetric sigma model's strong point is its reliability 
in non-perturbative calculations, which contrasts with fact that the replica sigma 
model can not be guaranteed to be correct in non-perturbative calculations. The 
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supersym metric model is called supersymmetric because its degrees of freedom in- 
clude both Grassman variables and normal bosonic variables. It does not involve 
the graded Poincare symmetry which is the starting point of the supersymmetric 
theory of elementary particles [71,72]. 

The key non-perturbative calculations which are the supersymmetric model's 
claim to fame are all zero-dimensional; they are essentially just calculations of gaus- 
sian random matrix ensembles. One dimensional systems are unusually tractable 
and can be solved exactly via several techniques, including the supersymmetric 
technique. In contrast, systems with two or more dimensions are too complicated 
and one has to make a perturbative approximation. This new approximation is 
called the diffusive approximation, and assumes that the system's conductance is 
large; i.e. that the disorder is small and electrons are conducted easily. In many 
supersymmetric calculations, one employs the diffusive approximation to show that 
£)-dimensional highly conducting systems are mathematically equivalent to certain 
specially chosen zero-dimensional random matrix ensembles. With this equivalence 
in place, one is able to calculate observables. Thus supersymmetric calculations 
of extended systems (with two or more dimensions) are an interesting mix of the 
perturbative diffusive approximation and a non-perturbative zero dimensional cal- 
culation. 

One of the biggest difficulties with the supersymmetric sigma model is the com- 
plexity of dealing with both Grassman variables and non-Grassman (bosonic) vari- 
ables. The variables are combined into graded matrices, i.e. matrices where half of 
the entries are Grassman variables and the rest are bosonic. Mathematical manipu- 
lation of graded matrices quickly becomes overwhelming, not because of any deep 
theoretical difficulty, but instead because of the huge number of details that one 
must keep track of. 

6.1.4 A New Sigma Model 

Recently Fyodorov derived a zero-dimensional model without Grassman variables 
which can be used to reliably do non-perturbative calculations of random matrix 
ensembles [9]. The new model is very attractive because it avoids much of the 
complexity of the supersymmetric model. This chapter (the one you are reading) 
presents a novel result: it generalizes Fyodorov's model to disordered systems in 
D dimensions. Both for the sake of rigorousness and to assist the newcomer, this 
chapter derives then new model step by step in a tutorial sort of way. 

Both the new sigma model and the supersymmetric sigma model are derived from 
exactly the same starting theory, so both should make identical predictions about 
the values and probability distributions of observables. However, one should always 
check the equivalence. Using the new sigma model and working on random matrices 
in zero dimensions, this chapter contains detailed calculations of the two most 
basic observables in random matrix theory: the density of states and the two point 
correlator. Shortly I will explain what these are, but for now let me just mention that 
the two point correlator is a non-perturbative result, which shows that the new sigma 
model equals the supersymmetric sigma model's strength with non-perturbative 
calculations. As I mentioned earlier, the extension of the supersymmetric model to 
extended systems is purely perturbative; after the new model's success with zero 
dimensional calculations, one has very good reason to expect success with extended 
systems as well. I do have plans to demonstrate the equivalence of the extended 
versions of the two sigma models by going on and calculating the two point correlator 
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in extended systems, but this calculation will have to wait until after this thesis is 
complete. 

6.1.5 Outline Of This Chapter 

Before starting the mathematics, here is a rough outline of the steps I will go 
through. In section I present the random matrix ensemble of concern and the 
observables that I'll be calculating. Then I set up the mathematical formalism, 
which is a generating function whose derivatives are various observables. Following 
the standard supersymmetric approach, I show that the generating function is equal 
to an integral over both bosonic variables and Grassman variables, and then - before 
doing the bosonic and fermionic integrals - I integrate over the disorder. Up to this 
point I am following exactly the same steps as the derivation of the supersymmetric 
sigma model, though I consider a somewhat larger class of problems than is usual. 
Returning to the usual equations is just a matter of setting certain parameters to 
have certain values. 

After the integration over disorder, one is left with a field theory with both 
bosonic and Grassman degrees of freedom. The next step in the supersymmetric 
method is to show that this field theory is mathematically equivalent to another the- 
ory where the degrees of freedom are matrices, one matrix for every point in space. 
The new matrices are graded matrices; they contain equal numbers of Grassman 
variables and bosonic variables. If the matrices are Nx N matrices with degrees 
of freedom, then of those degrees of freedom will be Grassman variables, and 

the other will be bosonic variables. The reason for this conversion is that - 

in zero dimensional models - the number of degrees of freedom in the new matrices 
is much less than the number of degrees of freedom in the previous theory. In 
other words, the supersymmetric method integrates out almost all of the degrees 
of freedom and is left with these graded matrices. This step of integrating out 
those degrees of freedom is called an Hubbard-Stratonovich transformation. It is 
an exact integration without approximations, but the graded matrices makes it very 
complicated. In fact, everything about the graded matrices is quite complicated, 
and the details of supersymmetric calculations are very taxing even when there are 
no conceptual difficulties. 

At this point (section l63|l I part ways from the supersymmetric method, though 
only in the details, not in the philosophy. Like the supersymmetric method, I do 
change to a new set of variables which (in the case of zero dimensional models) 
is much smaller than before. However the new variables are not supersymmetric 
matrices. Whereas the supersymmetric method would arrive at one N x N graded 
matrix, I arrive at two y ^ T rnstrices, each of them having N^/i bosonic variables. 
Thus my two matrices, taken together, have the same number of bosonic variables 
as the graded matrices derived by the supersymmetric method. But they contain 
no Grassman variables! This is the principle advantage of the new sigma model: it 
avoids all the complications of Grassman variables and graded matrices. 

The conversion to these new matrices happens in two steps. The first is an 
Hubbard-Stratonovich tranformation which converts the Grassman variables in the 
original model into a single y ^ T rn3trix, which I call the fermionic matrix even 
though it contains no Grassman variables. This step is exact, and one arrives at a 
theory in terms of the fermionic matrix and the original bosonic variables. 

At this point I would like to convert the bosonic variables into a second y ^ T 
matrix, which I will call the bosonic matrix. It is possible to do this conversion 
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exactly only in zero-dimensional systems. In extended systems one cannot perform 
the conversion without at the same time regularizing the field theory, which means 
requiring that the bosonic matrix may not vary appreciably between points which are 
close together. With the regularization in place, the conversion to bosonic variables 
becomes an exercise in evaluating an effective Lagrangian. Thus I arrive at the new 
theory in terms of a fermionic matrix and a bosonic matrix, both composed entirely 
of bosonic variables. This is still conceptually the same as the supersymmetric 
approach; the only differences are that in the supersymmetric approach one ends 
up with only one matrix, and it is a graded matrix. 

At this point one still has to do a complicated matrix integral, with one matrix 
per point in space. My next step (in section l6^ is again conceptually the same as 
in the supersymmetric approach: I make a saddle point approximation. Since the 
degrees of freedom are matrices, the saddle point equations are matrix equations, 
so I spend some time explaining how to take matrix derivatives. The saddle point 
approximation requires me to evaluate the second derivative (the Hessian) of the 
action. This is equivalent to a one loop integral, so I spend some time going through 
a detailed evaluation of the one loop integral. 

It is impossible to calculate the saddle point equations or the second derivative 
of the action without regularizing the fermionic matrix, which means requiring that 
the fermionic matrix may not vary appreciably between points which are close to- 
gether. The mathematical mechanism for introducing the regularization is called 
the diffusive approximation. Because I regularize the bosonic and fermionic matrices 
separately, I can use the regularizations to introduce a disparity between the two 
matrices. However, I can also choose to match the regularizations so that the end 
result shows a remarkable symmetry between the bosonic and fermionic matrices. 
In this way I would match the results of the supersymmetric method, where there 
is only one matrix and only one regularization, so it is natural (but not required) to 
regularize all of the matrix elements in an identical fashion. I would like to point out 
that the derivation of the new sigma model is no less rigorous than the derivation 
of the supersymmetric sigma model: in both cases one must regularize the fields, 
in both cases the details of the regularization procedure can have a large impact on 
the final result, and in both cases these details are represented in the final result 
via the introduction of new parameters whose values can not be rigorously derived 
from the original theory. 

After one has regularized the fields and found the saddle point, one finds that 
the saddle point has a continuous symmetry when certain parameters are set to 
zero. Therefore, there are Goldstone modes which have either zero mass or small 
mass, depending on whether the parameters are zero. In contrast to the Goldstone 
modes, there are other degrees of freedom which are always very massive, and 
the saddle point approximation constrains these degrees of freedom to have their 
equilibrium values. From a matrix point of view, this just amounts to requiring 
that the eigenvalues of the matrices have fixed values. In this way one arrives 
a sigma model - "sigma model" just means a field theory with a constraint on 
the fields. The supersymmetric method and the new method discussed here do 
qualitatively the same thing - both find the same sort of saddle point, and derive 
a sigma model. However, the details are significantly different: in particular, the 
mathematics of factoring a graded matrix into massive modes and Goldstone modes 
is very challenging. 

After deriving the sigma model, one still has to do an integration over the 
Goldstone modes. Both the supersymmetric model and the new model presented 



6.2. The Original Problem 



59 



here are able to do the Goldstone integration exactly when treating zero-dimensional 
systems; this is why these models are reliable for non-perturbative results. In sections 
I6.6l and f6.7l l calculate two observables for a zero-dimensional system, the gaussian 
unitary ensemble. Although the new model developed here is simpler than the 
supersymmetric sigma model, there are still a fair number of details to attend to, 
particularly in the calculation of the two point correlator. In zero dimensions it turns 
out to be a bit easier to do the saddle point calculation after integrating out the 
Goldstone modes, which is the reverse of the order followed by the supersymmetric 
sigma model. I hope to redo the calculation in the more normal order, but that will 
be done after this thesis. I also have plans to calculate an observable in an extended 
system, but this too is being postponed. 

6.2 The Original Problem 
6.2.1 The Ensemble 

The Hamiltonians in the ensemble which I calculate in this chapter live in a lattice 
with V different lattice points, or sites. The number of dimensions and other 
structural details of the lattice are modeled by the kinetic energy matrix (operator) 
K, which I will not specify at this time except to say that it is not a random 
variable. In a zero-dimensional system V = 1 and K = Q. There are N different 
basis elements at each site, so the total basis size is is x 1/. I use the lower 
case letters n,v to denote indices in the basis. The v index would normally be 
written as x or p, but my notation saves space. The kinetic energy matrix K is 
diagonal in the index n and independent of n; I write this statement mathematically 
as K — K {viV2)S{nin2) . In my notation the arguments in parenthesis specify a 
matrix or vector's indices, and there is an implied sum whenever two matrices or 
vectors share an index. Throughout this chapter I will use the words "operator" 
and "matrix" interchangeably. 

The random matrices themselves are a sum H + K of the kinetic operator K 
with a random potential H. H is diagonal in the spatial index; i.e. it is a "local" 
potential. However, it does vary from site to site. Mathematically, this is written 
as H = H {nin2Vi)6 {V1V2) ■ 

I require that H be hermitian. Mathematically the hermitian case is the simplest, 
but the model presented here should be easy to generalize to other ensembles. I 
choose the probability distribution: 

dPiH) =dHx (iV/2<2)'^''/'2^(^-i)^/2g-^Tr.(H^)^^^f ^ 3^ 

In this and all other uses of a matrix measure I use the following normalization 
convention: dH = {Y[idHii)([\i^j^dHl^dHlj^) . ^ has units of energy and sets the 
scale of the disorder. 

F is the power of the determinant, and should be an even number to make 
the probability distribution dP be positive definite. Physically a model with non- 
zero F corresponds to letting the disordered potential interact with F species of 
fermions; integrating out the fermions gives these determinants. I added these 
fermions to the ensemble in order to create a toy model of QCD, where one does 
indeed integrate out fermions and does obtain a probability distribution which is 
weighted by a determinant. 
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Late in this chapter I will set F = and thus obtain a standard sort of disorder. 
This standard disorder is spatially uncorrelated; i.e. its value at one point is not 
correlated with its value at any other point. It is also Gaussian, which implies that 
its higher moments can be expressed as products of its second moment a la Wick's 
theorem. It is easy to calculate the second moment of the F = potential using 
formula l631 the result is: 



H {nin2ViV2)H {n3n4V3V4) = —S{nin4)S{n2n3)5{viV2V3V4) (6.4) 

When F = and the number N of degrees of freedom per site is equal to one, a 
suitable choice of kinetic energy operator K turns this ensemble into the Anderson 
model, which is the usual starting point of the supersymmetric sigma model. On 
the other hand, when F — 0, V — 1, and, of course, K = Q, this ensemble turns 
into the gaussian unitary ensemble described earlier. 

This completes the ensemble's specification. 

6.2.2 The Observables. Their Relation to Averages of 
Green's Functions. 

In this chapter I calculate two observables: the density of states and the two point 
correlator. The simplest of the two is the density of states p{E). In a zero dimen- 
sional system {V = 1), the density of states is defined as the number of eigen- 
values per unit energy. It is a function of energy E because one will find more 
eigenvalues in some parts of the spectrum than in others. Given a Hamiltonian 
h = H + K , the zero dimensional density of states has the mathematical definition 
p{E) = Tr{5{E -h)). 

In a spatially extended system {V ^ 1) one must generalize these formulas a 
little bit. Define the local trace Tr„ as a trace over all the basis elements which 
share the same spatial coordinate v. Thus the usual trace is a sum of local traces; 
Tr — J^v'^^v- With this formalism in place, I can define the local density of states: 
p{E, v) = Try{5{E — h)). The global density of states is just a sum over the local 
density of states; p{E) = J2vPi-^^ 

For the rest of this section I will discuss only the global density of states. In 
unlocalized systems, this quantity is sufficient for calculating observables based on 
the eigenvalues. If, however, spatial information is desired, one needs to use the 
local density of states. All of the formulas in this section are given in terms of 
the global density of states. However all of them, with the exception of the global 
unitary ensemble results in equations 16.61 and 16.81 can be easily converted to the 
local density of states by simply replacing the usual trace Tr with the local trace 
Try. 

I will be calculating the average of the density of states, averaging over the 
ensemble of random Hamiltonians h — H + K: 



p{E) = Tr{S{E - h)) (6.5) 
The average density of states in the gaussian unitary ensemble is well known; 



^— N I E^ 



(6.6) 
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Section l6^ gives a detailed derivation of this result. 

I will also calculate the two point correlator, which is defined as: 



R2{Euh2)^ ===-== (6.7) 

p{Ei) p{E2) 

Note immediately that the numerator is the average of the multiple of the two delta 
functions, not a multiple of their separate averages. R2 is the precisely the former 
divided by the latter. Its qualitative meaning is best understood by analyzing its 
dependence on the energy difference lo = Ei — i?2. If eigenvalues are most often 
separated by an energy loi, then R2 will have a peak at = o^i. If, on the other 
hand, eigenvalues are never separated by an energy u)2, then i?2(t^2) will be exactly 
zero. In other words, the two point correlator describes how eigenvalues prefer to 
be spaced relative to each other. In the gaussian random matrix ensemble, when oj 
is small the two point correlator is given by the expression: 

R2{Ei,E2) = 5{uj-p) + 1 - 

TTLOp 

This result, minus the delta function, will be derived in section I^Tl p is the density 
of states at i? = i?i + E2. The missing delta function is not physically significant; 
it just represents the fact that p{E) is perfectly correlated with p{E) when E — E. 

The field theory approach to random matrix problems starts by reformulating 
observables in terms of Green's functions. The Green's function of a matrix h is 
defined by: 

G{E) = [E-hy^ (6.8) 

This matrix function has poles at energies E equal to the eigenvalues of h. The poles 
encode in a mathematically convenient way a precise description of h's eigenvalues 
and eigenvectors, which is one of the biggest reasons why Green's functions are so 
popular with physicists. In order to set Green's functions on a firm mathematical 
footing, one must add an infinitesimally small imaginary part i€ to the energy. This 
moves the poles slightly off the real axis. If the imaginary part is negative, the poles 
move into the positive imaginary half plane. The corresponding Green's function 
Ga{E) = {E — le — h)^^ is called the advanced Green's function. If instead the 
imaginary part is positive, then the poles move into the negative imaginary half 
plane and one has the retarded Green's function Gii{E) = {E + le — h) ^ . Clearly 
the advanced and retarded Green's functions are related by complex conjugation: 
Ga{E)^GUE). 

The density of states is related to the Green's functions via the following equa- 
tion: 

piE) = lim Im{Tr{GAiE))) (6.9) 

One proves equation 16.91 by changing to the basis which diagonalizes h and then 

applying the identity 5{E ~ E) = lime^^o Im{E — le — E) . This parameter- 
ization of the 5 function can be verified by checking that for every function f{E) 
the following equation holds: 

f f{E)^ lim Im{{E - le - Ef') = f{E) (6.10) 

J^oc 27r e^O 



The check also involves noticing that the integrand is nonzero only at E — E. 
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Following equation 16.91 I will calculate the average level density by calculating 
the average of the trace of the advanced Green's function, taking its imaginary part, 
and then dividing by 2tt. A similar strategy can be used to calculate the two point 
correlator, which can be rewritten in terms of products of two Green's functions: 

p{E)piE) = ^\unIm(Tr{GA{E)))Im{Tr{GA{m 

= lim[(Tr(G^(i?)) - Tr{GR{E))) x (Tr(G^(i)) - Tr{Gn{E)))] 

= -^^\im]Tr{GA{E))Tr{GA{E)) + Tr{GR{E))Tr{GR{E)) 
~Tr{GA{E))Tr{GR{E)) - Tr(GR{E))Tr{GA{E))] 

^ ^^}^^MTr{GA{E))Tr{GR{E))) 

-Re{Tr{GA{E))Tr{GA{E)))] (6.11) 

I define the advanced-advanced two point correlator 
Raa = '^^^n^oTr{GA{E))Tr{GA{E)) and the advanced-retarded two point corre- 
lator Rar^ = \inie^()Tr{GA{E))Tr{GR{E)). I will calculate Rar in section I6TT1 
and Raa in section 15.7.21 With these results in hand, the following formula gives 
the two point correlator: 

R,iE,,E.) = ^<^^d=^<^ (6.12) 
87r2 piE,) p[E2) 

6.2.3 The Generating Function 

I will be calculating averages of an advanced Green's function, of the multiple of 
two advanced Green's functions, and of the multiple of an advanced and a retarded 
Green's function. When calculating other observables, one could end up averag- 
ing the multiple of any number of advanced and retarded Green's functions. One 
powerful way of calculating multiples of Green's functions is to calculate instead a 
generating function whose derivatives are the desired multiples of Green's functions. 
The nicest advantages of this approach are that it allows you to avoid repeating 
certain steps for every observable one calculates, and also that many sorts of mis- 
takes are easier to detect. In this section I introduce a generating function which 
is suitable for calculating the observables that I'm interested in. This generating 
function is the mathematical starting point of my calculations; in fact most of this 
chapter will be spent on various manipulations and approximations of the generating 
function, and only at the end will I take derivatives and obtain the observables. 

First let me present a generating function which is sufficient for generating a 
single Green's function: 

ZiE,%J)^ detjE-H-K) 
^ ' ' ' det{E-H-K-J) ^ ' 

J is a source matrix J = J{nin2ViV2)- Note that when the source J is set to zero 
and the two energies are set to be equal, the generating function Z{E = E, J = 0) 
is equal to one. Using the identity relating the determinant of a matrix to its 
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logarithm det{A) — e^''''"^'^)^ one can easily prove that: 

= Z X {n2V2\iE - H - K - J) \nivi) (6.14) 



dJ{nin2ViV2) 

(If you are not familiar with how to take this sort of derivative, skip ahead for a 
moment to section 16.5.11 on matrix derivatives, and then come back.) Therefore, 
one can calculate the Green's function by first taking the derivative with respect J 
and then setting J = and E = E: 

dZ 

^ dJ{nin2ViV2) ^-'=°-^='^ " ("2V2|(.B - H - X)"ViWi> = {n2V2\GiE)\niVi) 

(6.15) 

As I saw in the last section, I typically want the local trace of the Green's 
function, which can be obtained by making the source J diagonal in indices n and 
V and independent of the index n; J = J {v)5{nin2)5{viV2) . With this choice of 
source, one has: 

[^]j^,,E=E = Tr.AG{E)) (6.16) 

This is exactly the result that is needed. 

How can this approach be generalized to obtain the product of two traces of 
Green's functions? Taking a second derivative of the generating function in equation 
16.131 results in: 

^^n^^^h-.^E.-E - Tr.AGiE))Tr.^iGm 

+ {n2V2\G{E)\niVi){niVi\G{E)\n2V2) (6.17) 

Note that the indices vi and V2 are not summed over although they are repeated, 
while ni and n2 are summed over. The first term in equation 16.171 is almost what 
we want, except that it would be better to have two different Green's functions with 
two different energies. The second term is not wanted at all. 
A better approach is to use the following generating function: 

ZiE,,J,,E2, J2) ^ -^^^^^^ - ^ - '^'^l' - ^ - (6.18) 

^ ' det{Ei - H - K - Ji) det{E2 - H - K - J2) 

With this generating function, one obtains the desired result: 

d'^Z 



dJi{vi)dJ2(v2) 



ljj^ = J2=a,Ei=Ei,E2=E2 



Tr,, (G(.Ei)) X Tr,, (G(.E2)) (6.19) 



Proceeding along the same lines, it is obvious that in order to produce the multiple 
of X Green's functions one must use a generating function with X determinants in 
the numerator and X determinants in the denominator. 

At this point I would like to remind the reader of the probability distribution 
dP{H) of the random matrix ensemble that I'm interested in calculating, which 
was given in eauation l6.3l It included a factor of det^{H + K), which represented 
the addition of F species of fermions to the usual gaussian ensemble. I will be 
calculating the average Z = J dP{H)Z of the generating function. I would like to 
point out that the extra determinants in dP{H) can be moved into the generating 
function Z. This doesn't complicate the generating function much, since it was 
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already composed of determinants. However it simplifies dP{H) a lot, reducing it 
to a gaussian function. 

To repeat, I am saying that averaging a generating function with X determinants 
in the numerator and X determinants in the denominator in an ensemble with 
F extra fermions is equivalent to averaging a generating function with X + F 
determinants in the numerator and X determinants in the denominator in a gaussian 
ensemble. Therefore I am interested in calculating a generating function with 1^ = 
X + F determinants in the numerator and = X determinants in the denominator. 

Instead of keeping track of all those determinants separately, it is convenient to 
combine them into one determinant in the numerator and one in the denominator. 
Here is an example of the basic concept: Suppose that I have two matrices A and 
B which each inhabit a basis with M basis elements, and I want to combine the 
determinants det {A) and det (B) into one determinant. I therefore invent a new 
basis with 2M basis elements, and then assert that M of the new basis elements 
correspond exactly with A's old basis and that the other M new basis elements 
correspond exactly with B's old basis elements. In the new basis AB = BA = 
and therefore det (^) x det (i?) = det{A + B). This is sleight of hand: I have 
gotten rid of the complication of having two determinants at the expense of using 
a basis which is twice as large. 

I now apply this trick to the case of determinants in the numerator and 
determinants in the denominator, with each determinant inhabiting a basis with 
N xV basis elements. I create two new bases. For the numerator I create a basis 
with N X V X basis elements; there are N x basis elements living at each 
site, and V sites. I use the indices nvi to denote the numerator's basis elements. 
For the denominator I create a basis with N xV x basis elements, and denote 
them with the indices nvj. I choose to use the index j for the denominator and the 
index i for the numerator in a deliberate effort to contrast the two bases. 

I now adjust my matrices to match the new bases. All the new matrices 
must be diagonal in the indices i and j. The kinetic energy K and the ran- 
dom potential H must be independent of i and j; the kinetic matrix in the nu- 
merator \s K = K{viV2)5{nin2)5{ixi\) and the potential in the numerator is 
K = K{n\n2v)6{viV2)S{iiii). The kinetic matrix and the potential in the denomi- 
nator have exactly the same form, but with j substituted for i. The source J occurs 
only in the denominator, and has the structure J = J{vi)6{nin2)S{viV2)6{iii2) 

There were determinants in the numerator, each with with a different energy. 
(Of course, F of the energies will be set to zero.) I denote the energies in the 
numerator with the notation {e(} = {e(, ...,Ejf}. Likewise there are energies 
associated with the denominator, which I denote with {Ej} = {Ef, Ej^}. These 
energies then define two energy matrices: E^ = e( 5{n\n2)5{v\V2)5{i\i2), and 

= E''^5{nin2)8{viV2)5{3i32)- 

Now that I have defined all the matrices, I can write the generating function: 

(6.20) 



ZiEf,E\j)= detj^^-H-K) 
det{Eb - H - K - J) 



This is the most general generating function which I will consider, and I will 
spend the rest of this chapter evaluating its average and derivatives. It can generate 
multiples of up to different Green's functions. After taking the derivatives, one 
sets of the energies Ef in the numerator to be equal to the energies E^ in 
the denominator. The remaining energies (if any) in the numerator are set to zero. 
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6.2.4 The Supersymmetric Formula for the Generating 
Function 

There are several ways of converting the generating function Z into a field theory. 
Here I concentrate on the supersymmetric method, which converts the determinant 
in the numerator into an integral over Grassman variables, and the determinant in 
the denominator into an integral over bosonic variables. I now briefly state the two 
identities used to perform this conversion. If A is an M x M matrix and ip,ip are 
vectors of AI Grassman variables, then: 

det{A) = i-2if^ J di^dipe^'l"^'^ (6.21) 

I am using the convention dtpdip = Yltl '^m'^m- One can find a similar result with 
vectors S composed of M complex bosonic variables [S = + iS^'^): 

det-\A) = {2711^1 dS^dS'ei^^^' (6.22) 

However the imaginary part of A must be positive definite in order for the bosonic 
integral to converge. This seemingly easy point is actually quite deep and the cause 
of many of the interesting mathematical intricacies in the supersymmetric model. 
Using these identities, I rewrite the generating function in supersymmetric form: 

Z = (27^)-^~^"(^/2)-^^^'(^)-^^^'■(^) 

X j #dV'rf5«d5^e^^^(^'-^-^-^)^*ei'^(^'-^-^)^ (6.23) 

i/) and -0 are vectors composed of x x Grassman variables, while S" is a vector 
composed of iV x y x J** complex bosonic variables. Examination of the exponents 
reveals that t/) and 5* must have dimensions of [Energy]. The differentials cause 

the generating function Z to have dimensions of [Energy]^'^^'^ ^ \ but this can 
be ignored without ill effect. 

L is the diagonal matrix with all diagonal elements determined by the sign of 
the imaginary parts of the bosonic energies; 

L{nin2ViV2iii2) = sign{Im{E'l))5{nin2)5{viV2)5{jij2) (6.24) 

One has to introduce this matrix L in order to insure the convergence of the bosonic 
integrals. I am assuming, as is usual, that the operators H,K, and J are real. 

One could also introduce a sign matrix similar to L in the fermionic integrals, 
and their guaranteed convergence allows one to choose any combination of signs 
one likes. Verbaarschot et al [73] explored this freedom in the context of the 
supersymmetric sigma model, and discovered that in that context one must choose 
the signs to all be the same, and thus obtain a compact representation for the 
fermionic variables. In the method I am presenting here, one can choose any signs 
one likes, but pretty soon (just before equation I6.38|) this fermionic sign matrix 
factors out entirely, and one is again forced to use a compact representation. 

6.2.5 Averaging Over the Disorder 

I want to calculate the average of products of Green's functions, so the next step 
is to average the generating function Z over realizations of the random potential, 
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averaging according to tlie probability distribution specified in equation 16.31 This 
would be difficult if I had not moved the fermion factor det^{H + K) into the 
generating function. But with the move accomplished, I just have to average over a 
gaussian distribution, which is easy enough. One expands formula r6. 231 as a power 
series in the random potential H, counts pairings of H, and then substitutes the 
second moment specified in eauation l6.4l for each pairing. The term to be averaged 
is: 

^^{SLHS'+i^Hi^) ^ J2 -(y ) (SLHS* + iPHi^f (6.25) 

Remembering that odd moments of H average to and that even moments average 
to the number of pairs Np times the second moment to some power, one obtains: 

°° TV -I 2' r y 

S(#^T) [iSLHS*+i,HWf\ ^'-''^ 

Np{2a) is the number of distinct ways of pairing 2a objects. It can be calculated 
as follows: 

Np{2a) = [A e''/'] 

aJ j=o 

2a 

-l/2r 



j=o 



{2ny^'^ j daa^'^e-^'^' = (2a ~ 1)! (6.27) 



(Thanks to John Keating for helping me with this.) 
Resumming the series, one finds: 



Turning now to evaluating the second moment, I have to evaluate three averages. 
The first is: 

2 

{SLHS*) = ■^S{niViji)L{ji)S* {n2Viji)S{n2Vij2)L{j2)S* {nivij2) 

= j;^Tr{SLSL) (6.29) 

I define the new quantity S as: 

S{viV2jij2) = S* {niViji)S{niVij2)S{viV2) (6.30) 

Remember that summations are implied wherever an index is repeated. Note that 
S has no dependence on n and that the trace does not sum over n. 
The second average is: 



— - - ^2 _ 

(ipHij) = —■tp{riiViii)-i}}{ri2Viii)-i}}{ri2Vii2)->}}{niVii2) = -—Tritptp) (6.31) 

where t/j is defined by ^'(''^1^^2*1*2) = '4'(niViii)tp{niVii2)S{viV2) ■ The minus sign 
will be important in subsequent calculations and was caused by the anticommutation 
of fermions. 
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The third average is: 



2SLHS*^Hi; = ^V(niUiii)V(n2?^iii)5(n2t;iii)5*(nit;iii) = (6.32) 

Combining these results, I find that the averaged generating function is: 

Z = 'yj (hfdTpdS'^dS'expijC) (6.33) 

where the newly introduced Lagrangian £ and prefactor 7 are: 

£ = ^SL{E^ -K- J)S* + y){E^ - K)^ 

- ^{-Tt{U) + TriSLSL) + 2X) 

7 ^ (2^)-^^^''(t/2)-^^^'(z)-^^'^'-(^) (6.34) 

This Lagrangian and prefactor are the starting point of the supersymmetric 
method. 

6.3 Conversion of the Theory to Matrix Vari- 
ables 

The main content of the supersymmetric method consists in first converting from 
the theory in terms of vector variables S and V to a theory in terms of graded 
matrices, and then performing various approximations in order to arrive at an integral 
which can be evaluated. Traditionally the process begins by transforming S and 
ijj to graded matrices. I will now begin to diverge from the traditional path, and 
transform S and separately, obtaining a theory in terms of matrices which contain 
only bosonic variables. However, I will still be following the overall strategy of 
converting to matrix variables and then performing the same approximations that 
are used by the supersymmetric method. 

6.3.1 Hubbard- Stratonovich Conversion of the Fermionic 
Variables 

First, I do an exact transformation of the fermionic vector ip into a bosonic x 
matrix . This step is called an Hubbard-Stratonovich transformation because 
it involves using a gaussian integral to rewrite a quartic term as a quadratic term 
coupled to a new field variable. Note that if Q is an x matrix, 

(iV/27r)^''/^2^'(^'-i)/2 J dge-*^'-('3^) = 1 (6.35) 

Using this identity and completing the square, one obtains: 

t2 



exp 



j dQeM~Tr{Qf^) + ^Tr{Q^)) (6.36) 
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I apply this transformation at each point in the spatial lattice, with a different 
Q-^ at each point. Thus I alter the equations for the generating function: 

Z = -fj dQ^d^d^pdS^dS^expiC) 

L = ^SL{E'' -K~ J)S* + ^-il){Ef - K)^ 

- f^{Tr{SLSL) + 2X) - -Tr{Qf ) + ^Tr{Qf^) 

7 = (27r)-^^^\z/2)-^"^'w-^^^'^(^)(iV/27r)^''^/^2^'(^'-i)^/2(6.37) 

The new Lagrangian is linear in ip and iJj, so I can use eauation l6.21l to integrate 
out these variables. I get: 

Z = 



7 = 

A{nin2ViV2iii2) ^ 

+ 
+ 

6.3.2 Effective Lagrangian for the Bosonic Variables 

I just integrated out the fermionic vectors ijj and ijj and obtained the matrix , 
which I will call the fermionic matrix even though it is composed of bosonic variables. 
I would like to do the same sort of thing for the bosonic variables S and obtain 
a matrix Q^. However the Hubbard-Stratonovich trick I used before won't work 
anymore, because the logarithm in the Lagrangian contains all even powers of 5*. 
Therefore I must simply introduce an x bosonic matrix Q'^ at each site and 
then integrate out 5. Introducing the matrix is an exact step: 



7 J dQf dS^dS^eyip{C) 

'-SL{E^ -K~ J)S* - ^Tr{SLSL) 
^Tr{Q^^)+Tr{\nA) 

(2^)-^^^\z)-^^^'^(^)(7V/2^)^''^/^2^'(^'-i)^/2 

+i-^S{niViii)L{ji)S*{n2Viji)5{viV2)5{iii2) 
i^Q-^ {viiii2)S{nin2)S{viV2) 

(Ef(ii)S(viV2)-K( viV2))d(nin2)S(iii2) (6.38) 



Z = -fJ dQfdQ^dS^dS^5{Q^ ~ S')exp(/:) (6.39) 
S was defined in eauation l6.30l I use the convention: 

5{Q' - 5) ^ n ^^Qrj - Sn )\{ '^(Q " " Sf^) (6-40) 

3 jk 

In order to complete the conversion to matrix degrees of freedom, I must get 
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rid of the S variables by calculating the effective Lagrangian Ceff- 

Z = jj dQf dQ''exp{C) 

^ = Ceff + ^Tr{Q''LQ''L)^^Tr{Qf') 

A{nin2ViV2iit2) = +t-^S{niViji)L{ji)S* {n2Viji)S{viV2)5{iii2) 
+ i^Q-^{viiii2)S{nin2)6{viV2) 

+ [Ef{ii)5{viV2)-K{viV2))5{nin2)5{ixi2) (6.41) 

Fyodorov showed how to calculate this effective Lagrangian exactly in a zero 
dimensional {V = 1) system [9]. Calculating Ceff non-perturbatively in systems 
with more than one site is much more difficult, and has not been done yet. In 
a system which is a continuum instead of a lattice, it is not clear whether the 
integral is even well defined. The source of difficulty, both on the lattice and on the 
continuum, is that the S variables at neighboring sites are allowed to vary wildly 
with respect to each other, and the Lagrangian's value at any given point depends 
on every one of the wildly varying S variables in the system. 

In order to evaluate the continuum effective Lagrangian, one must regularize S, 
requiring that it not vary significantly between nearby points. In the next section 
I will introduce a regularization of S. Having applied this regularization, I will do 
a non perturbative calculation of the effective Lagrangian in a diffusive limit where 
the fields are permitted to vary only over very long distance scales. I will then use 
effective Lagrangian arguments to determine the leading corrections to the diffusive 
Lagrangian. This procedure will give me both the non perturbative information nec- 
essary to determine the saddle point and also the perturbative information necessary 
to determine the effective Lagrangian of the final sigma model. 



6.3.3 Diffusive Limit of the Effective Lagrangian 

I will evaluate the diffusive limit of the effective Lagrangian given in eauation l6.41l 
by first using a diffusive approximation to move the determinant outside of the 
integral, then regularizing the theory, and then evaluating the remaining integral in 
a diffusive approximation. 



6.3.3.1 The Determinant 

First let's address the determinant of A. This is hard to evaluate because all of the 
indices are coupled: the SLS* term couples n to v, while the term couples i 
to V and the kinetic term K ensures that different points vi and V2 are coupled. 
However, if we assume that S does not depend on the spatial index v, then the index 
n decouples from everything else, and one can begin simplifying the determinant. 
We choose a unitary transformation U which diagonalizes Aq — UAW , the part of 
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A with a spatial dependence: 

Aq = t^Q^{viiii2)S{nin2)S{viV2) 

+ [Ef {ii)5{viV2) - K{viV2))5{mn2)5{ixi2) (6.42) 

Then the determinant decomposes into I^V small determinants, one for each eigen- 
value A of Aq-. 

det{A) =J\det{\ + i^S{nin)L{j^)S*{n2ji)) (6.43) 

A 

Note that these small determinants are each in very small bases, each basis having 
only N basis elements. Now we're in business; 

det{A) = n (A)^exp(Tr(ln(l + ^^Sin^n)Lin)S* {712^)))) (6.44) 

A 

Expanding the logarithm as a power series, it becomes obvious that: 

det{A) = l[{\fexp{Tr{\nil+z^S*{mn)Sin,j2)Lij2m 

A 

= [](A)^exp(Tr(ln(l+^^5i))) (6.45) 

A 

The trace has now changed from being over the n index to being over the j index; 
all the S dependence is now hidden in the matrix S, which is what I was aiming 
for. Now to re-insert the dependence on v and i: 

det{A) = (det(Ao))^~-^ det(Ao + ^^SL) (6.46) 

The first determinant is in the ni basis, while the second is in the nij basis. 

This simplification relied on the approximation that S does not depend on 
the spatial index v, an approximation which I will now discuss in more detail. 
First, some new notation: I divide the SLS* term into a translationally invari- 
ant part SLS* and a fluctuating part 5S, and define a Green's function g = 

- K + t^Qf + i^SLS*) . In this notation, one has: 

det {A) = det (5-1 + 5S) = det (g"^)det (1 + g6S) (6.47) 
In mathematical terms, the approximation consisted of two steps: 

1. Just before equation 16.421 I threw away the last determinant in equation 
16.471 This step is valid when the Green's function g varies at length scales 
which are much longer than the characteristic scale of variation of S. This is 
called the diffusive limit, because it means that the Green's function allows 
particles to diffuse over distances that are much bigger than the length scale 
of the disorder represented by S. The diffusive limit is a sort of mean field 
approximation. 

2. I have put S in equation 16.461 even though the correct term to insert would 
be the average S. This is a diffusive approximation similar to the previous 
one, and one can hope that the two will partially cancel each other out. 
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Note that these approximations are not needed in the V ^ 1 zero dimensional 
system, and the factorizations are exact. 

Applying equation 16.461 I simplify the effective Lagrangian: 

e^'ff = {detiAo)f~''detiAo + i^Q'L) 

X J dS'^dS'SiQ' - s)e^.SLiE--K-j)S' (g 

Note that all of the effective Lagrangian's dependence on is now outside 
of the integral. This is very good news, because it means the I have obtained the 
exact dependence, up to corrections to the diffusive approximation. Moreover, 
because the approximations which I have made so far are exact when 5* and S are 
translationally invariant, I expect (but can't prove) that any terms correcting the 
effective Lagrangian's dependence on will include at least one spatial derivative 
of Q\ 

The remaining integral in 16.481 can be evaluated exactly in the large N limit 
I^V < N; this is an old way of taking the diffusive limit [11-15], and was recently 
used by Fyodorov to derive a sigma model [10]. In order to maintain a close 
correspondence with the supersymmetric sigma model, I will avoid taking the large 
N limit, and instead examine what further approximations are required in order to 
evaluate the effective Lagrangian when N is small. I will obtain a result which 
is similar to the one obtained by the large N limit, which gives some additional 
confirmation that my regularization and approximations are correct. 

6.3.3.2 The Continuum Regularization 

The integral in equation 16.481 can also be evaluated exactly for any value of N 
when the system is zero dimensional; V = 1. One can also imagine evaluating it 
numerically on a lattice. However, as it stands it is ill-defined on the continuum, 
because it permits Q'^ and S to be totally different at neighboring points. In order to 
calculate the continuum effective action, one must regularize and S, forcing them 
to be constant at some small length scale. I now do this regularization explicitly 
by breaking the system's volume V into Vk blocks with sites per block, and 
requiring that S and Q'' are constant within each block. In a zero dimensional 
system k = 1. 

6.3.3.3 The Diffusive Approximation 

If the kinetic energy operator K is zero, then the integral in equation 16. 48l factorizes 
into Vk integrals, one for each block of constant 5* and Q^. Each of these integrals 
is effectively a zero dimensional integral and can be evaluated exactly. I now make 
an approximation and do this factorization even though K 0. This is a diffusive 
approximation, because it is is valid if at the block volume scale k"^ the kinetic 
energy K is small compared to the other operators. 

While this approximation gives a correct treatement of the effective Lagrangian's 
zero momentum (translationally invariant) behavior, it does lose some important 
information about the effective Lagrangian's low momentum behavior. This is not 
a problem because later I will restore these low momentum terms to the Lagrangian 
by making the usual arguments from effective field theory. 
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I now calculate the remaining integral in eauation l6.48l for a single block of k^^ 
points. This derivation, up through eauation l6.58l is a very slight generalization of 
the proof due to Fyodorov [9]. Doing the sums over the spatial index v, I obtain: 

J dS^dS'SiQ" - s)e^SLiE^-K-j)s' (g 



The delta function in equation 16.491 can be be Fourier transformed using the 
identity S{G) = (27r)-^' 2^(^-1' / dFe'^'-f-^^'^') to obtain: 

(2^)-'"'2-^'+^'^' J dFd5«d5^ei^'^««'-^)^)e5^^^(^'-^-^)^*(6.50) 
Finally using equation 16. 22l to integrate over 5, I get: 



.JVTr(L)(2^)/ N-r- 2-'"+'°'\i)'^^^'^^^tj^-'\l{E' -K- J)/k), 

= /d^^e3r'^((Q'-s)F)(det(^^-^))-^ (6.51) 



The second superindex on the new function ry specifies the rank of the matrices 
F and /i. ry is easy to evaluate, and can be reduced to a pretty form if l'^ < N. I will 
evaluate it by establishing a recursion relation between rj'^'^ and rj^^^'^ . The 
first step is to transform F to a basis where Q'^ is diagonal. I adopt the notation 
Qb = WQ^W\ fi = WfiW^ . The measure dF is, of course, invariant under this 
transformation. I obtain: 

r?~-^(M) = ydFe5^'^W"^)det(WW^t_^))-^ 

= ydFet^'W"^)det(F-A))^'^ (6.52) 

Preparing for the recursion, I decompose the I x I matrices F, fi, and into 
parts: 

/ hi fi-i \ 

The (/ — 1) X (/ — 1) matrix is just Fj with the topmost row and leftmost 

column removed. There is a useful relation between the determinant of Fj and the 
determinant of Fj^i: 

det (Fi) - dot (F/-i)(/ii - /^_i(F/_i)-V/-i) (6.53) 
Therefore I can rewrite ?;: 

r7^'^(A.) = |dF,_iei^'^W^-^-^)det(F,_i-A/-i)-^ 

nrf/urf/iV/iie^^°^^^"(/ii - All 
i>i 

- (/^_i-ALi)(^/-i-A/-i)"'(//-i-A/-i)) (6.54) 

The integral over /n is a simple contour integral. Recall that /i = L{E'' — K ~ J) 
and therefore its imaginary part is positive definite. Therefore the pole is in the 
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positive imaginary half plane. If n is positive, the contour integral must extend 
into the positive imaginary half plane and therefore includes the pole. If, on the 
other hand, Q^n is negative, then the contour integral extends into the negative 
imaginary half plane so that the pole is not included and the integral is identically 
zero. We have: 

X J dF,_ie3^'-('3°^-i^^-i)det (Fi^i - Az-i)""^ 

X (6.55) 

After a shift in the remaining integration variables //-i, we apply formula lb. 221 
for bosonic determinants to obtain: 



X j dF,_ie*^'-(«"-i^-^)det - A/-i)"'^+' 

= ^A^-iJ-i(^) y_''ii) e^9°iiAii(!g^)'^ '27rV^-i (6.56) 

The formula for ri^'^{^) gives the same result if one defines rj^'^{ii) = 1. Using the 

identities Q^«A« = rr(QV) and [Q^uf^' = (detg'')^"^ one obtains 
the final expression for r], which is valid if > /: 

l=N~I 

(6.57) 

Substituting this result into eauation l6.51l I obtain the final value of expression 
16.491 the integral of a single block: 

Q{Qb)ei-.TriQ-Li&'i^-^)) (det Q'f-'' 

N-l 
l=N-I>' 

(6.58) 

The matrices in result are all single site matrices, without spatial indices. 
The effective Lagrangian contains an instance of this expression for every one of 
the Vk blocks. Multiplying them all together and substituting them into equation 
16.481 for the effective action, I obtain the following expression for the diffusive limit 
of the effective action: 



= (det(Ao))^-^dct(Ao+z|^g''L)e(Q'')et^'-W'(^(^^-^-^)))(detg 



^g''L)e(Q'')etrKQ'(i^(^^-i^-./))) (det Qof''^ 

( n ^'^ iNl''^+'^rWNV^-I>'{l''-l)V/2 + l''NV2l''{l'' + l)V/2 (ggg^ 



N-1 -V 



l=N-I 
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In eauation l6.59l the matrices have regained their spatial indices and the traces once 
again include sums over the volume. I have omitted factors of k in the exponents 
of the multiplicative constants; there should be a k to match each explicit factor 
of V. However in the continuum these multiplicative constants are ignored, and 
on the V ~ 1 lattice k = 1, so this omission doesn't matter except on extended 
lattices. 

Combining equation 16. 591 with eauation l6.41l I obtain the diffusive Lagrangian: 



Z = 7 / dQf dQ''exp{C) 
Jo''>o 



*>0 

j^TriQ'LQ''L) - -Tr{Qf^) + [N - l')TrHA,) 



Trln{Aa + i^Q'' L)+'-Tr{Q\L{E'> ~K- J))) + ^(iV - /^)(Tr In Q^) 

N-l -V 

''(l'' + l)V /2-li'V/2~Nl''V 



l=N-I 

^Nl''V^-l''{l''~l)V/2~lf lfV/2 



(6.60) 



I switch the units of Q'^ so that it will be dimensionless, just like Q-^ already 
is: Q'' (^^^. Since the integral contains a factor of dct^"^'' (Q'')(^Q^ this 

tranformation multiplies it by a factor of (^)^^ ^ ■ Moreover, I multiply det (Q) 

by det (L) = , and compensate by multiplying the constant 7 also by 

det{L). 



Z = [ dQf dQ''cxp{C) 



T>0 

^ _ ^-I*NV j^NI*V+lf I^V/2^(I^-N)Tr{L)V-I^I^V+2I^NV 
N-l 

l=N-I'' 

£ = -^Tr{Qf^) + {N~l'')Tr\n{i^Qf + Ef -K) 

- Y^r{Q^LQ^L)+iNTr{Q''L{E'' -K-J)/Cl + k{N - /'')(Tr In (Q''L)) 
+ Tr\n{Ef - K + i^{Qf + Q''L)) (6.61) 

The two terms in the top line are in the vi basis, the three terms in the second 
line are in the vj basis, and the last term is an amphibian, living in the vij basis. 

Equation 16.611 completes the conversion to matrix coordinates and gives the 
final expression for the Lagrangian in the diffusive limit. This Lagrangian will be the 
starting point for sections f6.6l and IHtI which calculate observables in the gaussian 
unitary ensemble. 

The effective Lagrangian has two continuous global symmetries when the source 
J = is equal to zero, the energy matrices E^ and E^ are proportional to the 
identity, and Q'' and are translationally invariant. There is a symmetry under 
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UQ^U\ where U is a unitary transformation which is diagonal in the spatial 
coordinate and also translationally invariant. There is also a symmetry under 
TQ^T^^, where T satisfies the constraint TL = LT, is diagonal in the spatial 
coordinate, and is also translationally invariant. Both of these symmetries were in 
the original exact expression for the effective Lagrangian, given in equation 16.411 
where the symmetry is accompanied by transformations of S and S* . These 
symmetries are very important for the physics of our model; in section 1^31 1 will 
do a saddle point approximation and find Goldstone modes around the saddle point 
corresponding to these symmetries. 



6.3.4 Corrections to the Diffusive Limit 

Shortly I will make a saddle point approximation and find a saddle point which is 
translationally invariant. Because equation 16.611 is exact in the zero dimensional 
{V = 1) limit and because the steps of its derivation were rigorous within the 
bounds of the chosen regularization and the diffusive approximation, its result should 
be exactly correct when and are translationally invariant. However, when 
I made the diffusive approximation, the Lagrangian lost its correct dependence on 
fluctuations in . It may have also lost similar information about , but I believe 
that escaped with much lighter damages. Because the final sigma model will 
require a correct dependence on fluctuations in and , I now analyze what are 
the perturbative corrections to the diffusive Lagrangian. 



6.3.4.1 Perturbation Theory 



One option for evaluating the perturbative corrections is to return to the origi- 
nal definition of the effective Lagrangian, given equation 16.411 If one knows the 
Lagrangian's correct saddle point, then perturbative corrections to the effective 
Lagrangian may be evaluated via a Taylor expansion in Q'^ and : 



C(n, m) 

(Q'-r.iQf)'" 

^ [(;^) [{-^) det{A)] (6.62) 



Qo and Qg are the values of and at the saddle point, while and 
are the deviations from the saddle point. The powers in n and m of and 
must be understood formally, since is a matrix with V^^V elements, and 
is a matrix with I^^V elements. The above Taylor expansion is an expansion 

(/■' + / )y distinct variables. C{n,m) is meant to represent the multiple of all 

the Taylor expansion coefficients. 

This integral can be further simplified by turning the derivative in into a 
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derivative in S. This trick can be easily demonstrated using the following example: 

dxg{a)-^S{x ~ /(a)) 

dx6{x-f{a))^-^g{a) (6.63) 

The last step was just an integration by parts and relied on an assumption - conven- 
tional in field theory - that total derivatives can be ignored. This derivative-shifting 
trick is easily generalized to multiple integrals. As long as the dimensionality of the 
delta function is equal to or greater than the dimensionality of the integral, meaning 
that the delta function kills all the integrations, equation 16.631 changes in only two 
respects: the derivative ^ turns into a matrix, and ^ becomes a gradient. If 
the delta function does not kill all the integrations, then things are a little more 
complicated: some elements of the matrix ^ diverge because the delta function 
constraint describes a surface. However these same matrix elements are the ones 
multiplying portions of the gradient which run parallel to the constraint surface. 
In this case one must be a little more careful and first determine the space n of 
directions normal to the delta function constraint. One then takes the gradient ^ 
only along directions in n, and multiplies that restricted gradient by the matrix 

Hiding these complications in the formalism I obtain the simplified effective 
Lagrangian: 



[(A) (4-) det{A)ei'^^^'~^~'n (6.64) 

This is a nonlinear sigma model with constraint S = Qq. The effective La- 
grangian Ceff is equal to the sum of all of this sigma model's connected diagrams, 
and probably can be evaluated perturbatively using conventional field theory tech- 
niques. 

Of course this approach is no good for evaluating non-perturbative terms in the 
Lagrangian, including the terms which determine the saddle point. Furthermore, the 
perturbative calculations will require us to regularize the theory, just as we already 
did when deriving the diffusive Lagrangian. The strengths of the corrections which 
we calculate will be determined by the details of the regularization, not by the 
original Lagrangian specified in equation 16.411 

6.3.4.2 Effective Field Theory 

Instead of deriving the form of the perturbative corrections from the original La- 
grangian, I will determine them by using arguments from effective field theory. 

I assume that the effective Lagrangian is local. I also assume that it will have 
the same symmetries that the original exact Lagrangian, given in equation l6.41l has 
when E-^ and & are proportional to the identity, the source J is zero, and both 
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and are translationally invariant. In particular, always comes paired with 
exactly one L and there are no constant operators (other than the identity) which 
I can insert into the effective Lagrangian. Corrections to these assumptions will be 
suppressed by powers of fc^, E^'s deviation from the identity, and &'s deviation 
from the identity. 

I am only interested in the low momentum corrections, so I expand the La- 
grangian in powers of the gradient, and keep only terms which are linear or quadratic; 
i.e. terms with no gradients, one gradients, two gradients, or a V^. I assume that 
the theory is rotationally invariant, so terms with one V are necessarily zero. More- 
over the Lagrangian already includes the correct result for translationally invariant 
fields. So the only possible corrections have either two gradients or a V^. 

I anticipate the saddle point approximation in section and assume that the 
fields Q'' and are small. Therefore I consider only the terms V^Q^ , W^Q'^L, 
QfV^Q-^, VQf ■ VQf, Q'^LV^Q^L, VQ'^L ■ VQ'^L, QfV^Q^L, Q^V^Q^L, and 
VQ-^ • VQ^L. As is conventional in perturbative field theory, I assume that terms 
which are total derivatives can be ignored, which eliminates W^Q^ and V^Q^L. 
Moreover partial integration reveals that several of the other terms are equal up to 
a total derivative. After weeding these out, the remaining possible corrections are 

The Lagrangian contains a trace over all indices. The VQ^ ■ VQ^L term, 
for instance, should be written as J^ijv'^Q'^ i''^'^) ' ^Q^{'vj)Lj. In this term the 
sum over j acts only on Q^L, so the trace can be moved inside the derivative, 
resulting in: ^(Ej *3'^(''^*)) ' (Ylj Q^i'^j)-^])- ' again anticipate the saddle 
point approximation, which will constrain the traces of and Q^L to be constant. 
Therefore both the Q^W^Q^L term and the VQ^ • VQ^L term are surpressed by 
the saddle point approximation. 

The only two remaining terms are the kinetic terms VQ^ ■ VQ^ and WQ^L ■ 
VQ^L. As I mentioned earlier, seemed to escape lightly from the diffusive 
approximations. Moreover, we will see in section l6. 5. SI that the diffusive lagrangian 
already contains a kinetic term for which looks just right. Therefore I doubt 
that the VQ^ ■ VQ^ correction is non zero. However, section 15.5.51 will also show 
that the diffusive approximation totally annihilated Q^'s kinetic term, making it 
absolutely necessary to restore it. 

Adding these perturbative corrections to the diffusive Lagrangian, I obtain the 
result; 

Z = 7/ dQf dQ''exp{C) 
Jq''>o 

^ _ ^-l''NV -^Nl''V+lf I^V/2^(l''~N)Tr(L)V-l''l''V+2l''NV 
N-1 -V 

X ( n ^'^ 2^''(^''+i)^/2-/^v^/2j^-/''(/''-i)i//2-/^/-'y/2 

l=N-I 
N 9 

£ = -—Tr{Qi) + {N-l'')Trln{t^Qi+Ei-K) 

- ^Tr{Q''LQ''L)+iNTr{Q^L{E^ -K-J)/C) + niN - /'')(Tr In (Q^'L)) 

+ Trln{Ef -K + t^Q^ + Q^'L)) + ^!^Tr{VQf ■ VQ^) + i^Tr(VQ''i • VQ^i) 



(6.65) 
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6.4 Reparameterization for Continuous Systems 

Three changes to eauation l6.65l are required before obtaining the continuum model: 

1. One must throw away the prefactor 7 because on the continuum it diverges 
and contains no physical meaning. 

2. One must change the constants in the Lagrangian. I define an intensive 
constant - the level density = Nv~^£^~^ - which will replace N. {v is the 
volume of a single point.) Also I define a = /N . Since has units of 
inverse volume, I now allow the traces over the spatial index to acquire units 
of volume and become integrals d^x. However, I will continue using the trace 
notation as before. The trace now includes an integral over the volume index 
V and sums over the fermion index i and the boson index j. 

3. I will have to regularize the spatial variation of the fields, prohibiting vari- 
ations at small length scales. I have already imposed this regularization on 
the bosonic matrix while calculating the diffusive limit of the effective 
Lagrangian. The regularization of the fermionic matrix will occur later, 
when I derive the effective Lagrangian of the sigma model. 

If I carry out changes U and 13 I obtain the following equation, which can be 
applied to both discrete lattices and also the continuum; 

Z = [ dQ^dQfe^ 
Jq>'>o 

- ^-Tr{Q^LQ^L)+ivTr{Q^L{E'' -K - J)) + u^nil - a){Tr\n{Q^ L)) 
+ ^Tr \n{E^ -K + i({Qf + Q'L)) 

+ Y^TriVQf ■VQf ) + Y^Tr{VQ^L-VQ'L) (6.66) 

Recall that the traces on the first line are over the volume and fermionic indices v 
and i, the traces on the second line are over the volume and bosonic indices v and 
j, and the last trace is over all three indices. 

6.5 The Sigma Model 

My strategy for evaluating the continuum generating function (equation I6.66|l will 
be to use the saddle point approximation, which will create constraint equations 
and thus give me a sigma model. Because the degrees of freedom are matrices, 
the saddle point equations are matrix equations, and are derived by taking matrix 
derivatives. Therefore I now briefly review how to take matrix derivatives. 

6.5.1 Matrix Derivatives 

On a conceptual level, taking matrix derivatives is just a matter of taking derivatives 
with respect to individual matrix elements of the matrices. I use the notation 5 to 
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represent two things: the total variation 5F of a function F, and also the matrix 
infinitesimal 6 which can be assumed to have all but one of its matrix elements 
equal to zero. 

• The First Derivative of a Polynomial: One can verify that S{Tr{X™)) = 
mTr{5X"^~^) by writing X = X + 5, expanding the trace to first order in 
S giving Tr{X"^) = rr(X") + ^^"^^ Tr(XJ(5X"-J-i), reorganizing the 
trace using the cyclic identity Tr{AB) = Tr{BA) to obtain Tr(X™) = 
Tr{X"^) + mTr{SX"'^^), and then taking the derivative with respect to the 
infinitesimal. 

• The Second Derivative of a Polynomial: The matrix second derivative 
can be obtained by taking the derivative of the first derivative; in this case 
the cyclic indentity can not be used again and one obtains 6i62Tr{X^'^'') = 
mET=J Tt{5iX152X^-^-^). 

• The First Derivative of the Inverse: Turning to 5Tr{AX~^), one writes 
Tr{A{X + (5)"^) = Tr{AX-^{l + X-'^Sy^), does a Taylor series expansion 
(1 + X-^Sy^ ~ 1 - 5X-^, and obtains 5Tr{AX-'^) = -~Tr{AX^'^5X-^). 

• The First Derivative of the Logarithm: 6Tr{\n {X + 6)) = 5Tr{\n (1 + X^'^S)) = 
Tr{X-H). 

In general there is a close analogy between the formulas for matrix derivatives and 
those for scalar derivatives. 



6.5.2 The Saddle Point 

I want to find the saddle point determined by the equations: 



dC , ^ , dC , „ 

= 0' [iTT?] =0 (6-67) 



Qq and Qg are the solutions of the saddle point equation 16.671 In order to 
simplify my equations, I define the Green's functions: 

G^iX) EE {Ef~K + i^Xf^ 

G\X) = [E^-K + iiXy^ (6.68) 

I also neglect the source J because it is infinitesimally small and should not 
determine the saddle point solutions. Referring to the Lagrangian in eauation l6.66l 
one finds that the derivatives in equation 16.671 are: 

: -uS,Tr{5Ql) + iue{l - a)Tr{5Gf [Qi)) + -—Tr{^5 ■ VQjJ) 



dQi 



dC 
dQ^L 



Tr{5Gf {Qi + QlL)) 



lb 

= -vS,Tr{5QlL)+ivTr{6{E^ - K)) + ~ a)Tr{5{QlL f') 



C2 

2 TT{V5-VQlL) + ^^Tr{5Gf{Qi + QlL)) (6.69) 
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Remember that most of the traces are over only two indices, while the last term 
in each equation is traced over all three. I would also like to point out that the quan- 
tity ^Qq in the Green's function (QI) — [E^ — K + j^Qq) is exactly the self 
energy of the Green's function. (This is because the self energy of a Green's function 
is defined as being — « times the quantity which is added to E—K.) The physical ori- 
gins of a Green's function's self-energy lie in scattering off of disorder. In fact the self 
energy is equal to the inverse of the scattering time r when using units where h = 1. 
Qq is a sort of dimension-free self energy and its inverse is proportional to the scat- 
tering time. 



Another very important point: these saddle point equations say nothing about 
matrix elements that are not diagonal in the position basis. This is because and 
Q'^ are local operators, diagonal in the position basis, so their infinitesimals are also 
diagonal in the position basis. 

Remembering that the Lagrangian has a continuous global symmetry when the 
source J is equal to zero and the energy matrices E^ and E^ are proportional to 
the identity, for the rest of this section I will enforce this proportionality by choosing 
E^ ^l(g)E\ where E^ = I^^^J^tEz- Similarly I choose # = 1®e\ where 

After I have made these choices for E^ and E'' and have set J = 0, the saddle 
point equation 16.671 does not depend on any aspect of Qq and Qg except their 
eigenvalues. I introduce the notation qi to signify the eigenvalues of Qq, pj to 
signify the eigenvalues of Qq, and Lj to signify the diagonal elements of L. In 
order to simplify the saddle point equations, I temporarily require that Qq and Qq 
be diagonal. This does not change the eigenvalues and therefore does not change 
the content of the saddle point equations. 

Because Qq is diagonal, the fermionic Green's function G-^ defined in equation 
16.681 is diagonal in the index i. Therefore I define the i'th component of G^ to be: 

G{iX)^iE{ -K + z^Xf' (6.70) 

The saddle point equations become: 

= ^e(l-a)(f|Gf(gO|:^> + ^E(^l^^(*+^'^■^J■)l^) 

i 

PjL, = iC'e' - ir'{x\KW) + ^{l - a){pjL,y' + {x\G{ [p^+PjL,] 



{6.1 



Equation 16.711 includes terms containing the diagonal element of the fermionic 
Green's function (x\G{\x). One can estimate the real part of this matrix element, 
and according to Mirlin [69] it is small compared to other quantities if we regularize 
the continuum behavior of Q^ and force it to be smooth at small distance scales. 
Moreover, Mirlin claims the freedom to shift the definition of the energy [E 
E + e) and thus set the real part of {x\G{\x) to exactly zero. Therefore I will only 
calculate the imaginary part. The first few steps are: 
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= r dE(E^ -E + i^xf' [ -f^S{E - {p\K\P)) 
Jo J (27rj 

(6.72) 

I define a new quantity, the average spacing of energy levels, usually called the 
level spacing: 

^-'(E) ^ I -^oSiE - mm) (6.73) 

The level spacing has units of inverse energy. Then I have: 

Im{y\Gi{X)\y) = dEA-\E)Im{{E^ - E + i^X) ) (6.74) 
Jo 

I have assumed that the kinetic energy {p\K\p) is never negative, which seems quite 
reasonable. 

I now assume that = < E^ , and that therefore Im{E^ -E + i^X) 
is zero for all negative energies. ^X is the self energy and t is the scattering 
time; this approximation asserts that the particle's energy is large compared to 
its scattering-induced self energy, which is a way of saying that the particle is 
only weakly affected by scattering. In other words, this approximation is a way of 
taking the diffusive limit and thus regularizing the short distance behavior of the 
continuum theory. This diffusive approximation probably should not be construed as 
a statement about the energy E , which I already have claimed (following Mirlin) 
can be set to zero. 

Next I play a trick: I define a level density at negative kinetic energies A~^(—£^) = 
A~^(£'), and extend the integral down to negative infinity: 

Im{y\G,{X)\y) = \ ( dEdr\E)Im{{E^ - E + i^X)'^ ) (6.75) 

^ J -oo 

The diagonal matrix element of the Green's function is now equivalent to a 
contour integral over E. This was the whole point of tranforming the integral over 
p into an integral over E, making the diffusive approximation, and then extending 
the integral down to —oo. This approach is standard in the literature, and will be 
reused later in this chapter. 

Doing the contour integral, I obtain: 

{y\Gi{X)\y) = -nTA-\E^)sign{X). (6.76) 

It is conventional to neglect the dependence of the level spacing A on the energy 
E. The fermionic saddle point equations become: 
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Qi = 71-^(1 - a) A ^sign{qi) H ^ sign{qi + pjLj) 

j 

PjLj = iC^E^ -iC'^{x\K\x)+K{l-a){pjLj)~^ + ^^J2^^9n{qi+PjLj) 

(6.77) 

Without any good justification, I now neglect p^Lj in the sum Qi +pjLj. (As 
I will explain later, it will turn out that the only possible effects on the final sigma 
model would be changes in the values of the saddle points and corrections to cer- 
tain constants in the sigma model Lagrangian.) This simplifies the saddle point 
equations: 

Qi = Tr^A~'^ sign{qi) 
PjLj = iC^e'' -i^-^{x\K\x) + K{l-a){pjLj)~'^ + ^^Y^sign{qi) 

i 

(6.78) 

The solutions of the fermionic saddle point equations are clearly: 

qi = Si7r^A"\ Si = ±1 (6.79) 

The bosonic saddle point equation is a quadratic equation 
= a{pjLj)^ — tbpjLj + c, where the coefficients are a = 1, 6 = ^~^E — 
^~^{x\K\x) — iji^ J2i sign{qi), and c = — /t(l — a). The solutions have the form: 

PjLj =ib + Sj a/k(1 -a)- 62, Sj = ±1 (6.80) 

If the imaginary part of 6 can be neglected and 6^ < k(1 — a), then the solution 
can be rewritten in the form: 

PjLj = Sj\/ k{\ — a) exp{isj(j)), sin {(j)) = =,Sj = ±1 (6.81) 

^^K{1 - a) 

Interestingly enough, the energy dependence of the solution is entirely inside the 
phase (j). In zero dimensions the saddle point solution is analogous to the one 
found here, but with k = 1. \ surmise that if I had not neglected the real part 
of the Green's function when solving the fermionic saddle point equation, I would 
have found an energy dependent phase in the solution. However I got rid of the 

fermionic phase by making a suitable shift in the energy E'^ . I repeat the step here 
and set b = 1, (j) = 0. The saddle point solutions now read: 

qi = Si7r^A"\ pjLj = SjVk(1 - ") (6.82) 

Remember that the eigenvalues of Q^'L were constrained to be positive; therefore 

Sj = Lj . 

Both K and A entered into the equations during the step of regularizing the 
continuum theory. I regularized the bosonic matrix Q** by breaking the lattice into 
blocks of size k, requiring to be constant in each block, and requiring that the 
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blocks were big enough that the diffusive Lagrangian could be factorized into a 
multiple of its values at each block. I regularized Q-^ by converting a momentum 
integral for {y\Gi{X)\y) into a contour integral, and in the process introduced A. 
Therefore I redefine k in terms of a bosonic level spacing: = tt^A^^, and I 
rename the fermionic level spacing to be A/. 

In this notation, the final saddle point solutions are the set of all Hermitian 
matrices Qq and Qq that have the following eigenvalues: 

= s,n^Aj\ p.j = 7reA-i\/T^, - ±1 (6.83) 

One would hope that A^ = A/. If so, my earlier neglect of pjLj in the sum 
Qi +pjLj did not affect the saddle point solutions at all. 

Values of Q'^ and Qf which do not satisfy equation 16. 83l at every point in space 
are massive and are therefore strongly disfavored by the Lagrangian. This results 
in a constraint that and satisfy the saddle point equations at every point in 
space. The presence of a constraint implies that this model is a sigma model. 

6.5.3 The Sigma Model Lagrangian, Part I 

In the last section I derived the saddle point solutions, which define a sigma model 
with the constraint that the eigenvalues of and are fixed. The saddle point 
constraint actually defines a continuous manifold at each point in space, since 
unitary transformations of are not constrained by the saddle point equations, 
and similar degrees of freedom exist within Q'^. When the source J is zero and 
the energy operators and E'' are both proportional to the identity, the model 
has degrees of freedom with exactly zero mass which correspond to varying Q'' and 

while both preserving their translational invariance and also obeying the saddle 
point constraint. I will call these degrees of freedom zero modes. There are also 
many other degrees of freedom which do not preserve the saddle point constraint, 
or exhibit a variation in space, or both. These are all massive. 

Now I begin a process of deriving the sigma model's action. This amounts to 
making a perturbative expansion of the Lagrangian given in equation 16.651 around 
the saddle point manifold. Clearly one must expand in Q'^ and Q-^ . However, as 
Kamenev and Mezard [74] pointed out in their paper explaining how to do certain 
non-perturbative calculations with the replica technique, the zero modes of and 
Q'' can not be treated perturbatively because they are free to vary over the whole 
saddle point manifold. Therefore I perform the decompositions Q'' — Qq + 
and = Ql + Q^- Ql and dire, the zero modes; the translationally invariant 
degrees of freedom which obey the saddle point constraint. They will be treated non- 
perturbatively. In contrast, and represent all the other degrees of freedom; 
i.e. the ones which disobey the saddle point constraint, or are not translationally 
invariant, or both. These will be treated perturbatively. 

Because I chose a saddle point with the source J equal to zero and both of 
the energy matrices E^ and & proportional to the identity, I also have to expand 
in these quantities. I will call the actual deviation of E^ from the identity uj^ = 
Ef and similarly Co^ = E^ - 1 ^^ 

I decompose the Lagrangian into the part Co depending on the zero modes Qq 
and Qg and the part depending on the other degrees of freedom. To second order 
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in Qb, , , E^, and J, the sigma model Lagrangian is: 



Co + iuTriQ^LiCj" ~ J)) + 
dC . , dC , 



+ 



dC 



dEfdQf SE=uf,5f=Qf dEf 5e = 



1, d^C 



(6.84) 



1 d'^C 

d^C 

+ ^diQbL)dQf\^^Q,^,^^Q, 
The Lagrangian £0 of the zero modes is given by: 

£0 = -^TriQl^) + ,ya^~a)TrHEf +t^Ql) 

- ^-Tr{QlLQlL)^ivS,Tr{QlL{Cj' - J)/0 + ^^^(1 - a) (^)'(Tr In (g^L)) 



+ ^Trln(^/+^^(Q^ + QgL)) 



(6.85) 



The first derivatives ^(^^^ and required by equation 16.841 are given by 
equation 16.691 The second derivatives required by equation 16.841 are: 



d^C 



dEfdQf 



d^C 
dQ{dQ{ 



d^C 
d{Qi'L)dQf 
d^C 
d[QlL)d{Q\L) 



i^Df 

-iy^Tr{5iS2) + -^Tr{V5i ■ VcJs) 
ye{l~a)Tr{5^Gf{Qi)62Gf{Qi)) 



Y,Tr{5,G^{Qi + q,L.i)52Gf{Ql + q,L,)) 
Y,Tr{5f{ii) G{{Qi + QlL)5,G{{Qi + Q^L)) 



-v^Tr{5i52)~^^ve/:^l\l-a)Tr{5ML) '^2(0^^) ') 



vD 



j6 T..T^i^^G{ (q, + QlmGiiq. + QlL)) 



2 



(6.86) 



I have been able to get rid of the triple traces over the vij indices, reducing them 
to traces over either vi or vj. The notation 6f{ii) means the diagonal elements of 

Sf- 

Throughout the analysis of the sigma model Lagrangian Ca I will use the two 
bases where Qq and Qq are diagonal to analyze all the matrix multiples and traces 
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that I run into. This will simplify a lot of calculations. However it will also cause 
La to depend on Qq and Qq, because as these matrices vary over the saddle point 
manifold the bases in which they are diagonal also changes. 

The derivative ^^.^f^.^ is exactly zero because of the saddle point equation. 
The -^j is also drastically simplified by the saddle point, and reduces to a term 

proportional to W^Q^ . This is a total derivative and therefore can be ignored. The 
term in eauation l6.84l has no effect on and Q'', and therefore I have taken 
the liberty of moving this term from to the zero mode Lagrangian Cq. If I had 
not moved this term, then the zero mode Lagrangian would have contained E 's 
instead of E-^ 's. 

There are four terms in equation 16.861 that are proportional to a, all of which 
contain two instances of the propagator . In the last two terms, the two prop- 
agators share the same index i. I will show in sections 16.5.41 and 16.5.51 that the 
multiple of two propagators with a shared index evaluates to zero in the diffusive 
limit. Therefore these two terms can be ignored. On the other hand, the first 
two terms that are proportional to a have exact analogues that are proportional to 
1 — a. The only difference is that they have the argument Qo + Ij^j instead of 
just Qq. If one neglects qjLj, eauation l6.86l loses all its dependence on a. So I will 
simplify my life by setting a = 0, which is equivalent to neglecting qjLj. This is 
consistent with my choce to neglect qjLj when finding the fermionic saddle point. 
As it turns out, the only possible ill effects upon the final sigma model Lagrangian 
will be changes of order a in the fermionic diffusion constant, the dependence on 
/ 



w-" , and on the saddle point eigenvalues of Qq 



I now combine equations 16. 84l and 16.861 I also use the fact that L{QqL) is 
proportional to the identity in the basis where Qq is diagonal. Thus I obtain: 



+ -viTr{Q Q) + -^Triyqf ■ VQO 
+ veTr{Qf {Qi) Qf Gf {Qi)) 



i^^Tr{Q'LQ'L)~,^^Tr{Q'' Q^) + —Tr{^Q^L ■ ^Q^L) (6.87) 



Equation OTl contains two terms of the form Tr{A^ G^ (QI) G^ (Qq))- s"'^ 
I need to find the low momentum behavior of these terms. I start by decomposing 
the trace in its i index. Because I have chosen to use the bases where Qq and Q\ 
are diagonal, G^ is also diagonal. Therefore: 

Tr{A^ G^{Ql)A^ G^ (qI)) 
= ^A1(ziZ2)g4(QoO^'(*2*i)G((QoO (6.88) 

11,12 

Next I expand the A's into their momentum components, using the identities 

A ^j£^A{t), {S\Ait)\u) = S^is + t-u), and 1 ^ J Then the 

trace turns into: 
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Tr{A^ Gf{Qi)A^ Qf {Qi)) 

d^t d^u d^v d^w v-^ .1/. . I <9/. . -^M ^ ^i^fi^x 

fo ,d E (^i^^, (^^ + ^ 1^4 k) A^(z2»i, ^) + 

[Ztt) {Ztt) {Ztt) (27rj ^^^^^ 

dDt d^w {w\Gi\w) A'{t2ii,-t) {w-t\Gl\w~t) 



d^t ^ . . f d^" 



(27rj J, .J, J (27r) 



The integral over w describes two Green's functions forming a loop which car- 
ries a total momentum equal to t. t is the momentum of the A matrices, which 
correspond to and Cj^ . A careful calculation of this integral involves some 
complexities, and will be the subject of the next section. 

6.5.4 Calculating the One Loop Integral 

In this section I calculate the low momentum behavior of the one loop interaction 
of two fields, one field with momentum k and the other with momentum —k. [k 
corresponds to in the last section.) The fields are connected by two Green's 
functions with the form given by equation 16.701 The loop integral is: 



M 



d^p 



iZnf El - E{p + k) + lih E2 - E{p- k) + 1^/2 



(27r)^ Ei-e-{E{p + k:)-e) + iih 

1 



E2-e-{E{p~k)^€) + iih 



(6.90) 



The integral over energy e has zero as its lower limit and infinity as its upper 
limit, but if I assume that the imaginary factors ^/^ are much smaller than the 
energies Ei, then the lower limit can be extended down to minus infinity, so that I 
have a contour integral. This is just the diffusive approximation which I introduced 
in section 16.5.21 One immediate result of the diffusive approximation is that if 
sign{Ii) = sign{l2), then the loop integral evaluates to exactly zero. Another 
consequence is that when k = the loop integral A! can be computed exactly; it 
comes out to: 

M(A:^0)^ ^— (y^) (6.91) 

This expression is valid for any value of the energy difference oj = Ei — E2- A is 
the level spacing which I defined in equation 16.731 
Note two properties of the loop integral: 

• If w = 0, the loop integral has an interesting symmetry: M{E, Ii, I2, k) — 
M{E, l2,Ii,—k). In systems with translational symmetry, M will depend on 
fc^ instead of fc; therefore in these systems M{E, Ii,l2,k'^) — M{E, l2,h,k'^). 



(6.89) 
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• In systems which are invariant under the parity transformation p ^ —p, the 
loop integral is real when uj = and Ii = —12- 

I now develop a perturbative expansion of M in the parameters fc^ and oj, 
expanding around the point w = 0, fc = 0. This approach is justified for computing 
low momentum contributions to the loop integral. In particular, consider the term 
which is first order in k"^ and zeroth order in ui. The most general form for this 
term that is possible is <d{Iil2)k^d{E, Ii, I2), where d is some function that I have 
not yet determined. I will show in eauation l6.99l that each term in the loop integral 
contains at least one power of (/i — I2) ^ ■ Since d{E, /i, /2, fc^) = d{E, l2,Ii, k"^), 
any analytic function d must have an even number of powers of ^(/i — 12). These 
points lead me to write the first order correction in k"^ as: 



27r^9(/i/2) 

eA2(i?)(/i-/2)' 



k'd{E,h,l2) (6.92) 



There has been no loss of generality except that required by the points I just raised. 
The extra factor of 27r^^/A^ redefines d in order to give it the correct dimensions 
for a diffusion constant. The second property that I listed earlier implies that d is 
real at Ii = —I2, which makes it natural to not include an explicit i in equation 

I can determine the other low order terms in the Taylor series expansion from 
equation 16.911 which contains all the terms in the Taylor series which have k'^ to 
the zeroth power. Doing this analysis, I find that to first order in k"^ and lo, but 
neglecting the term which is proportional to cjfc^, the loop integral is: 



27re(/i/2) , 27r9(/iJ2) , T:id{E,h,l2) ,^2 ■ ^ ^^ 

^ AC|/i-/2|+Aa/,-/2f^ A k +^s^gn[h){E,-E2)) 

(6.93) 



In summary, general arguments and the fc = integral have allowed me to 
evaluate the loop integral M at the lowest perturbative orders. The only thing I 
did not calculate was the exact expression for the diffusion function d. 

Now I will show how to do a thorough calculation of the loop integral given in 
equation 16.901 with no assumptions about the size of lo other than that it is small 
compared to Ii. Examining eauation l6.90l one finds that the interesting parts of the 
integrand - the poles - are very small unless either Ei w E{p + k) w E{p) — e or 
E2 ~ E{p~ k) K, E{p) = e. Therefore E{p ± k) — e is bounded by either ui or the 
change which k makes in the kinetic energy, whichever is bigger. I assume that these 
quantities are small compared to 7^; therefore Ei — e + ^ {E{pztk) — e). 
Doing a Taylor expansion in the small parameter and using the notation Gi = 
{E,-€ + i^Iiy^, I obtain: 

J (27rj 

X (Gi + Gi{E{p + k) - e)Gi + Gi{E{p + k) - e)Gi{E{p + k) - e)Gi)"' 

X (G2 + G2{E{p- k) - e)G2 + G2(S(p - k) - e)G2{E{p- k) - e)G2)"' 

(6.94) 
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I again assume that k makes only small changes in the kinetic energy ofp + k. 
Expanding the kinetic energy function perturbatively in powers of k gives: 

E{p-k)'^e + VE-k+^k-VE-k (6.95) 

The notation V signifies the tensor second derivative J2a,b |a) dx^dxb ^^1' ^'^''^ ^he 
notation V just makes the fact that the gradient is a vector even more explicit. 
Keeping only terms to second order in k, I obtain: 

M = [ de-^^6{e-E{p))x[GiG2 + l{GlG2 + GiGl)k-VE-k 

+ {G\G2 + GxGl - G\Gl){VE ■ kf + {G\G2 - GiC?i)V£ • k] (6.96) 

The integral's dependence is now contained entirely in the derivatives of E, 
so I am able to average these derivatives over the constraint e = E{p). I define the 
average as: 

{0{E)) ^ A{E) J ^6{e - E{p))0{p) (6.97) 
Using this definition, I can finally do the integral over p: 

[G1G2 + \{G\G2 + GiGl)k- {VE) ■ k 

{GIG2 + GiGl - GlGl){k ■ {VEVE) ■ k) 
{GIG2 - GiGl){VE) ■ k] 

Mo + ^k-Mi-k + k-M2-k + M3-k (6.98) 
In the last line I have simply broken up the one loop integral into its four parts. 

Before performing the integral over e, let me briefly estimate the effect of taking 
a derivative with respect to e. I will use a simple model, choosing E — -^fc^- With 

this model £{A-\e){VE)) = ^{A-\e){VEVE)) = '^"'-^'^'^ . This indicates 
that taking a derivative corresponds (at least when estimating orders of magnitude) 
to dividing by the kinetic energy. 

I now turn to the task of performing the contour integral over e. Where one 
Green's function occurs to the n-th power and the other to the m-th power, the 
contour integral will result in an expression like this: [A^^ {e){0{e))G™{e)]. 

If the derivative acts on G™, the resulting term will have an extra power of 
(£^1 — E2+ t^ih — l2))~^- On the other hand, if the derivative acts on the com- 
bination A~^{e){0{e)) , the effect will be more or less a division by a constant 
proportional to the kinetic energy. I will simplify my life by assuming that the ki- 
netic energy is much smaller than the energy difference {Ei — E2 + i^{h — h)), 
and therefore neglect derivatives of the Green's functions. (I could evaluate M with- 
out making this approximation, but the resulting expressions are considerably more 
complicated.) Throwing out all terms with extra powers of {Ei — E2+ t^{Ii — I2)) 



M 



J deA-\e) x 



+ 
+ 
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in the denominator, I find that the contour integrals are: 

Mo = 27:i{Ei-E2+t^iIi-l2)r^A-\Ei+iCh) 

M2 = m{E^ -E2+ t^ih - l2)r'^{^-\e){VEVE))l^^^^^^j^ 

M3 = -27r*(i?i-i?2 + ie(/i-/2))"'^(A-i(e)(Vii;»|,^^^+,^,^ (6.99) 

These are the results for sign{Ii) — l,sign{l2) = —1. When the signs of the 
imaginary parts Ii and I2 are reversed, the results remain the same except for 
two things: each of the four equations is multiplied by —1, and also A and the 
derivatives are evaluated at E2 + «C^2 instead of Ei + This second difference 
should be substantial because the typical scale of change of A is proportional to the 
kinetic energy. However, if we assume yet again that oj is small compared to 
M's symmetry under interchange of /i and I2 ensures that changing the sign of 
the I's has no effect on eg uations 16.991 except to change their sign. For simplicity 
I will assume that uj is small compared to ^/i, but one could avoid this assumption 
if one wished to treat the case of large cu. 

I assume that the kinetic energy is spherically symmetric; AI3 is zero and Mi 
and M2 are both proportional to the identity. Therefore I can write: 

^ 2^7rg^ffn(/l)e(/l/2) , ttDP , 



A{Ei~E2+iah-l2))' A\h~l2\' 
D ^ A|/^_j,|A(_|(A-i(,)(vi?)) + A^(A-i(,)(Vi?Vi?») 

(6.100) 

I once again use the approximation that <C — /2I to obtain the formula: 

27re(/i/2) 27re(/i/2) , 

(6.101) 

This is the same result as in equation 16.931 except that now I have an explicit 
expression for the diffusion function d{E, £^12) , and now d is manifestly real. In 
the literature it is conventional to use a diffusion constant (not function) D which is 
real, is constant under interchange of Ii and I2, and does not depend on the energy 
E. The above derivation shows that these assumptions can be justified only in very 
specific limits, and that in general D is correctly understood as a function, not a 
constant. The diffusion constant D found in the literature is best considered as a 
phenomenological constant appearing in the effective Lagrangian and nothing more; 
the expression derived here should not be taken seriously except to demonstrate the 
overall physics. In fact the main value of this derivation was not its final result, but 
instead its explanation of how one would derive the loop integral M for large uj. 

A few words about the validity of this integral. Because the integral includes 
the factor 0(/i/2) and the saddle point solutions imply that Ii = ±I2, the quantity 
Ii — I2 is proportional to Ii. In the course of evaluating the one loop integral I 
have made the following approximations and assumptions: 
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1. Spherical symmetry. 

2. The diffusive approximation: ^7; ^ Ei. 

3. k makes only small changes in the kinetic energy, allowing a perturbative 
expansion in k. The change in kinetic energy can be estimated as Dk^, 
which has a typical value given by the Thouless energy Ec- 

4. E.^^h^E,. 

5. w < < E^ 



6.5.5 The Sigma Model Lagrangian, Part II 

I now return to the task of evaluating the effective action of the Goldstone bosons. I 
start by substituting the saddle point solutions, given by eauation l6.83l into equation 
16.1011 for the one loop integral. Because I chose to evaluate the saddle point at 
E^ — 1® E ; the energy difference occuring in eauation lS.lOll is exactly zero. The 
result of this substition is: 



Dk'^ 

M = e(s(»i)s(z2))r'(l + ^) (6.102) 

I remind the reader that the s variables are just the signs of the fermionic saddle 
points. 

Substituting this result into equation 16.891 I obtain: 

Tr{A^ Gf{Qi)A^ {QI)) 

/fjOf T-)f2 
-—jj J2 A\^l^2,t ) A'{^2^l,-t) e(.(zi).fe))(l + ^) 

(6.103) 

I have taken the liberty of moving the factor of one half in the equation k — into 
the diffusion constant D. 

Because the s variables have unit magnitude, 0(s(ii)s(?;2)) = -^(1 — s(ii)s(i2)). 
I define the sign matrix S of the saddle points as the diagonal matrix whose entries 
are composed of the s variables: 

5(^l^2) = s{ii)S{tii2) (6.104) 
The new sign matrix allows me to rewrite equation 16. 103l in matrix notation: 

Tr{A^ Gf{Qi)A^ G^iQl)) 
^ 2eJ + ^)iTr{A\t) A'{-t)) -Tr{A\t) S A\~t) S)] 

(6.105) 

The traces in equation 16.1051 are over the i index only. I exploit the fact that A^ 
has a momentum of k and A^ has a momentum of —k to turn the into the V • V 
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operator. Thus I am able to return to original trace over both the volume index v 
and the i index: 

Tr{A^ Gf{Qi)A^ {Qi)) 

= J-TrlA^A^ - A^SA^S) + -^Trl^A^ ■ ^ A^ - V^^ S ■ VA^ S) 

(6.106) 

I now plug this result into exDression l6.87l for the Lagrangian of the sigma model: 

+ —Tr{VQ''L-VQ''L) 

+ '^Tr{VQf ■ VQf - VQfS ■ VQ^S) + ^TriVQ^ ■ VQO 

- iy^Tr{Q''Q^ + Q^LQ^L)-'^Tr{Qf + QfSQfS) (6.107) 

The terms on the last line of equation 16.1071 are mass terms. It is instructive 
to analyze them in terms of the individual matrix elements of Q'^ and Q-^ . The 
mass term reads: 

- Y E (1 + IQ^(«*i*2)|' (6.108) 

I have used the fact that is Hermitian. The mass term for is entirely anal- 
ogous, with s{i) replaced by L{j). When s{ii) ^ s(«2), the term in parenthesis 
evaluates to zero. The corresponding matrix elements [viii2) Vs(«i) ^ s(«2) 
and Q^{vjij2) Vi(ji) ^ L{i2) have very small masses caused only by their ki- 
netic energy. These are precisely the degrees of freedom which do not change the 
eigenvalues of Qf ^ + Qf and Q''L = Q^L + Q^L. 

The other matrix elements [viii2) Vs(ii) = s[i2) and Q^{vjij2) VL(ji) = L{j2) 
are the very massive degrees of freedom. These are precisely the degrees of free- 
dom which change the eigenvalues of — Qq + and Q'^L = QqL + Q^L. If 
we assume that energy operators and a)'' and the Thouless energies and 
D'^k^ are all small compared to ^, then we can immediately integrate out these 
very massive degrees of freedom, and fix the eigenvalues to have exactly the values 
prescribed by the saddle point solutions. Thus we arrive at the sigma model. 

I will avoid integrating out the traslationally invariant degrees of freedom {p = 0) 
and Q''{p — 0). I make this choice because some observables (including the two 
point correlator) evaluate to zero in the zeroth order saddle point approximation. To 
evaluate the higher order corrections to the saddle point approximation, one must 
postpone this integration until the last moment. Instead I will explicitly separate 
the p~Q degrees of freedom from the py^O degrees of freedom with the notation 
change Qf {p = 0) + , ^ Q^P = 0) + Henceforth Q refers only 

to the degrees of freedom which are translationally invariant, while Qij) = 0) refers 
to the degrees of freedom which are translationally invariant but do not obey the 
saddle point constraint. 

The fermionic matrix contains (^(Tr{Syf' -\- jl^ degrees of freedom which 
violate the constraint on the eigenvalues, and after integration each of these degrees 
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of freedom multiplies the generating function Z by a factor proportional to 
In contrast, one can estimate that the other degrees of freedom each multiply the 
generating function by a bigger factor: either or , depending on whether 
u! or Ec is larger. Therefore the generating function depends on the trace of the 
saddle point sign matrix roughly as: 



Z (X exp {-^{Tr{Syf In {^/E,)) (6.109) 



The net result is that saddle points with the minimum value of {Tr{S))'^ are favored 
by exponentially large factors. The number of negative saddle point eigenvalues 
should be as close as possible to the number of positive saddle point eigenvalues. 



The saddle point sign matrix S occurs in two other terms in equation 16.1071 
After applying the saddle point constraint, these terms simplify. The resulting sigma 
model Lagrangian is: 



£a = Co + ivTr{Q^L{Co'' -J))- ivTr{Qf Cj^ ) 

+ '^TriVQ'L ■ VQ'L) + !^i£±^Tr(yQf ■ VQ^ (6.110) 



Recall that and were Wilson coefficients of the perturbative corrections 
to the effective Lagrangian. In contrast, D came the original Lagrangian, but via 
the approximations entailed in computing the low momentum behavior of the one 
loop integral. All three are phenomenological constants. I take the liberty of setting 
Q-^'s total diffusion constant to be equal to Q^'s diffusion constant. I am now ready 
to write the final sigma model. 
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6.5.6 The Final Sigma Model 
The final sigma model can be described briefly in only five points: 
1. The generating function is given by: 



£(p = 0) 



2. The matrix Q^[p = 0) contains all of Q^'s translationally invariant degrees of 
freedom. It is composed of two parts: the saddle point solution Qq, and the 
small perturbations Q''{p = 0). Qq can vary freely, with the constraint that 
its eigenvalues be equal to pi = tt^A"^. Q^{p = 0) is a small perturbation 
to Qq, and consists of the translationally invariant degrees of freedom which 
change the eigenvalues of . 

3. The matrix Q^{p = 0) contains all of Q^ 's translationally invariant degrees of 
freedom. It is composed of two parts: the saddle point solution Q^, and the 
small perturbations {p = 0). QI can vary freely, with the constraint that 
its eigenvalues be equal to qi = Si7r^A~^, Si = ±1. These eigenvalues qi 
are constrained to minimize {Tr{J2iSi))^ ■ Q^{p = 0) is a small perturbation 
to Qq, and consists of the translationally invariant degrees of freedom which 
change the eigenvalues of . 

4. The matrix is composed of all the degrees of freedom which are not 
translationally invariant, and is constrained to not change the eigenvalues of 
Qo + Q''- This constraint is the mechanism by which Q'' interacts with the 
other degrees of freedom. 

5. The matrix is composed of all the degrees of freedom which are not 
translationally invariant, and is constrained to not change the eigenvalues of 
Qo + Q^ ■ This constraint is the mechanism by which Q^ interacts with the 
other degrees of freedom. 

The form of Lagrangian C is very similar to that of the supersymmetric sigma 
model [75]. When = = 2 and J = 0, this reads: 

CsusY = '-^Str{AQ) + ^StriVQ-WQ) (6.112) 



= JdQ^{p = 0) dQ\p = 6) e^(P-°) 

X J dCtdQU^ 

= -'^Tr{iQf{p = 0)f) + - a)Tr\n{Ef + i^Qf{p = 0)) 

- '^Tr{Q\p = 0) L Q\p = 0) L) + ivTr{Q\p = 0) L (w*" - J)) 

+ vi{l-a){^)Tr\n{Q''{p = Q)L) 
A6 

+ "^TrXniE^ +iiQf{p = Q)+iiQ\p = Q)L) 

= ivTr{Q^L{iJj^ - J))-ivTr{Q^ujf) 

+ ^Tr(Vg''L • Vg''L) + ^Tr(VQ^' • VQ^) (6.111) 
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However this similarity hides a huge difference in complexity: the Q's in the super- 
symmetric Lagrangian are graded matrices, and are quite challenging to manage. 
Just changing I'' and is very challenging, which is why the literature has confined 
itself to = /■'■ = 1 and 1^ = 1^ = 2. This is why the sigma model presented 
here is so important. 

A few words about the validity of this model, which is best understood in terms 
of energy scales. One of the important energ scales is the inverse scattering time 
= = TT^^A^^. The assumptions used to derive this model were: 

1. Spherical symmetry. 

2. The dominant saddle points are translationally invariant. 

3. The final field theory is local. 

4. The diffusive approximation: r is much less than the particles' energy. 

5. w < r 

6. Ec<^T 

7. 1 ^ r/^, which implies that A ^ ^. This last inequality was necessary 
to justify the perturbative expansion of the Lagrangian; one can see this by 
estimating the next order corrections and seeing that they are divided by 
roughly t/^. 

8. Ec £, and oj ^. These assumptions were necessary to justify integrating 
out the very massive modes. I would still have obtained a quite valid model 
without them. 

This completes my derivation of the new sigma model. 



6.6 The Density of States in a Zero Dimensional 
System 

in this section I calculate the density of states p{E) of a non-extended system 
{V = 1). Equation 16.91 indicates that I need to compute the average of the trace 
of the advanced Green's function. I will obtain this by first computing the average 
generating function Z with 1^ = 1^^ 1, and then taking the first derivative. 
Equation OH implies that L = -1. When F = 1, = = 1, and L = -1, 
equation 16.611 simplifies to: 



7 / dq^dq^exp{C) 

-Y = C-N j^N+1/2 ?.N+2_}^_2l/2^1/2 

' ^ {N-iy. 

C = -^qf^ + {N-l)ln{i^qf + Ef) 

N 9 

- —q'' -tNq\E''-J)/^+{N~l)\liq'' 

+ \n{tCqf + E^' - i^q'') 
Ao = i^qf+Ef (6.113) 
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and q-^ are just real numbers. When dealing with the non-extended system, 
it is always helpful to immediately make a shift q^ ^ q^ + i-E^ to obtain: 



Z = -iV^+i/2_±_2i/2^-i/2exp(7V£;/Vc') 
dqUq\qf ~ g^) 

N 2 

X eyiY>{-—q^ -iNqfE^/C+{N ~l)liiqf) 

N 9 

X exp{-—q^ -iNq\E'' ^ J)/ £_ + {N -1) In q^) (6.114) 

Remember that I will calculate the Green's function by taking a derivative with 
respect to J and then setting E'' = E^ and J = 0. When E'' = E^ and J = 0, the 
integrand of eauation l6.114l is antisymmetric under interchanges of q'' and q-^ . The 
only thing that is not antisymmetric is the limit of integration q'' > 0. I will shortly 
be making the saddle point approximation, in which the limit of integration will be 
neglected; in this approximation Z{E^ = E^ ,J = 0) is exactly zero. Therefore I 
need to either retain information about J in the saddle point approximation, or else 
take the derivative with respect to J now. I opt for the latter option, which just 
multiplies the integrand by a factor of iNq^/£_. 

I will evaluate the integral 16.1141 bv making the saddle point approximation in 
both variables. But first I briefly review the saddle point approximation. 



6.6.1 The Saddle Point Approximation: A Review 

In order to be general, I assume a Lagrangian C — ~Nq'^/2 + itNqE/^^ + iV(l — a)ln q, 
with t = ±1. The saddle point equation for this Lagrangian is = ^, which im- 
plies that = —N{q — itE/^ — {1 — a)/q) at the saddle point. This saddle point 
equation has two solutions: q{s) = — ax cxp{is(f)), s = ±1, sm(f>= ^ 

£i 

dq- 



At the saddle point, the Lagrangian has second derivative 4-t = — 2A^exp(— iS(/))cos 



The value of the Lagrangian at the saddle point is: 

N{l-a) 



Cq — exp(i2s(/)) -|- isA^Vl — aE / ^eKp{isq 



+ N{l-a)ln{stVl^) + N{l-a)is(f) (6.115) 
To the leading order in N, the saddle point approximation is given by: 



dq.f{q)exp{£{q)) « ^ /(g(s))exp£o(g(s)) Mgexp ( ' 



2 'dq^',^. 



q=q{s) 



'^^j^^^^^(^M^s(t>/2)f{q{s))expCo{q{s)) 

(6.116) 



If / is zero at the saddle point, or perturbative corrections are desired, one instead 
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uses the formula: 

J dqf{q)exp{£{q)) = ^exp{Co{s)) J dqf{q)exp{-q'^Nexp{-is(l))cos{(l))) 

OO 

X exp{-N{l~a)J2r\-Q/Qis)y) (6.117) 

3=3 

One evaluates this integral by first expanding the last exponential perturbatively in 
powers of q; after the integration, each power of q is weighted by a power of ^J^, 
which justifies the perturbative expansion. However, the expansion can be expected 
to diverge at high orders. Moreover, there are certain effects which are totally lost 
in the expansion; for instance, when evaluating Z, the integration constraint q^ > Q 
is not respected. In the saddle point approximation, this constraint's only effect is 
to eliminate half of the bosonic saddle points. 



6.6.2 Applying the Saddle Point Approximation 

I am only interested in the leading order, so I take a = 0. I take the derivative of 
equation 16.1141 with respect to J, set E'' = and J = 0, and apply equation 
16. 1161 s version of the saddle point approximation. Without the constraint q^ > {) 
there would be 2 x 2 saddle points, two from the bosonic integral and two from 
the fermionic integral. The constraint eliminates one of the bosonic saddle points, 
leaving only two (1 x 2) saddle points. Here is the result: 

Tr{GAE))^^ = ziV^+i/2,^-i_l_2i/2^i/2exp(iVi;V2e') 

E— s X ex-niisd)) — ex-pi— 16) 
^ X exp(— «0) 
^ ^cos (0)cos (0) 

N 

X exp(— — exp(j2s(/<) + isA^£^/^exp(is0) 

+ {N - l)ln(-s) + {N - l/2)is0) 

N 

X exp(-yexp(-i2(/)) - iN E / S,ex-p{-i(j)) - (TV - l/2)i(?!)) 

(6.118) 

Equation 16.1181 gives the s = — 1 saddle point a weight of zero. This is a 
consequence of the integrand's original antisymmetry in q^ and q^ , and of the fact 
that the fermionic saddle point s = — 1 is symmetric with the bosonic saddle point, 
which has a (bosonic) index Sb — —1. One can evaluate higher order corrections 
to the s = —1 saddle point by using the integral formulation of the saddle point, 
eauation l6.117l However, these corrections can be neglected to leading order in N. 
Simplifying eauation l6.118l I find: 



Tr{GA{E)) ^ (-l)^-l7VA^+V2_l_23/2^1/2,^p (^^2/2^2) 

X ^exp(-i(?!))exp(-7Vcos(2(/.) - 2NE /^sm{(j))) (6.119) 
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The last exponential can be reduced to exp (— A^(l + £'^/2^^)) by using the def- 
inition of (f), sin ((/)) = and various trigonometric identities. Stirling's formula 
gives [N - 1)! = \/27rexp \~N + {N - l/2)ln{N)). Applying these identities and 
eauation l6.9l I obtain: 



The correct result would be: 




(6.120) 



(6.121) 



So my result is correct up to a multiplicative constant of (—1)^ ^. This phase is 
probably caused by some personal error. I haven't had time to figure out figure this 
out yet. 




6.7 The Two Point Correlator in a Zero Dimen- 
sional System 

in this section I calculate the two point correlator R2iEi, E2) of a non-extended 
system {V — 1). I will calculate this observable using eauation l6.12l which I repeat 
here: 

R(E E)^ MRab) - RejRAA) 

First I will calculate ^the advanced-retarded correlator 

Rab. = ^vca^^QTr{GA{E))Tr{Gii{E)), and then the advanced-advanced corre- 
lator Raa = lime^o TriGA{E))Tr{GA{E)) . 



6.7.1 The Advanced- Retarded Correlator 

I will obtain the advanced-retarded correlator by computing the average generating 
function Z and then taking the second derivative, as shown in the following equation: 

Rar = TrGR{E^)TrGA{E2) = j^jj-^Z{Ef ,&)\ Ef^EK.j=o (6-122) 

Because I am calculating the average of a multiple of two Green's functions, I 
must choose /" = // = 2; i.e. now the degrees of freedom will be 2 x 2 matrices. 

Because one Green's function is advanced and the other retarded, the imaginary 
parts of their energies have opposite sign; equation 16.241 implies that: 

ill- 1,^22 = -1 (6.123) 

The sign of L could be reversed, but this doesn't matter. Remember that L is 
diagonal. 
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When one chooses V ^ 1, I'' = 1^ = 2. L as in equation 16.1231 and again 
makes the shift — > + lE^ eauation l6.61l turns into; 

Z = -f j dQfdQ''exp{C) 
Jq''>q 

7 = N^^+\{N-l)l{N-2)l)-^2\-''{-lf 

C EE ~—Tr{Q^)-iNTr{QiE^/C) + {N~2)Tr\iiQf 

_ ^Tr{Q''LQ''L)+tNTr{Q''L{E'' - J)/0 + {N - 2){Trln [Q^'L)) 
+ Tr\n{Qf + Q^L) (6.124) 

6.7.1.1 The Angular Integrals 

Now I'm confronted with two matrix integrals. When integrating over a matrix M , 
it is frequently helpful to decompose the matrix into M = UDW , where D is the 
diagonal matrix formed from M's eigenvalues, and U is the unitary transformation 
required to diagonalize Af. Having made this conceptual step, one then changes 
variables from dM to dDdU. This is the approach I will take with both and 
Q-l" . All but two of the terms in the Lagrangian depend only on eigenvalues. The 
two exceptions are: iNTr{Q^L{E^ - J)/^) and -iNTr{Qf E^ / Cj- 

The matrix is special because both of its eigenvalues are required to be posi- 
tive semidefinite. Because is always accompanied by the matrix L, I will actually 
decompose Q^L into eigenvalues and a transformation. Fyodorov [9] discussed the 
properties of Q^L with a lot of clarity, and also explained how to decompose any 
2N X 2N matrix Q^L where N of Us eigenval ues are +1 and the other N are 
— 1. Probably this proof can be generalized easily to arbitary combinations of +1 
and —1. However, here I won't bother with the details of the general solution, but 
instead just consider a 2 x 2 matrix. 

As Fyodorov showed, Q^L is non-Hermitian, so it can not be decomposed into 
UDW . However, its eigenvalues are still real, one positive and the other negative, 
so one can decompose Q^L ~ TPT^^, where P is diagonal and T is a special 
" pseudo-unitary" matrix. Q^L has four degrees of freedom, while P has two degrees 
of freedom; T should have two degrees of freedom in order to conserve degrees of 
freedom . Because one wants to respect the special eigenvalue structure of Q^L, 
one chooses a special parameterization of T: 

cosh (-0) sinh (?/')exp {—i9) 

sinh (^/')exp (i9) cosh (?/;) 

Its inverse T^^ is similar: 

cosh (-0) —sinh (■0)exp {—i9) 

— smh (0)exp {lO) cosh (-0) 

9 varies from to 2tt, while ■0 varies from to oo. With this parameterization 
of T, Q^L is parameterized as: 

f ^(pi +P2) + 5^1 -P2)cosh(2?/;) -p2)smh(20)exp(-i6') \ 

V 5(^1 -P2)sinh(20)exp(i6') \{pi + P2) - \{pi - P2)cosh.{2'ip) J 
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Of course when you change variables you must also introduce a Jacobian; i.e. 
a function which keeps track of the relation between the old variables and the new 
ones. The most straightforward way to do this is simply to take derivatives of 
the four old variables with respect to the four new variables, form these sixteen 
derivatives into a 4 x 4 matrix, and then calculate the determinant of that matrix, 
which is the desired Jacobian. This is straightforward; after a page of algebra I 
obtain J = \{pi — P2)^sinh (2'(/;). The [pi — P2)^ dependence on the eigenvalues 
is quite typical of matrix decompositions. 

I now turn to the term Tr{Q'^ L{& — J)/£,), which is parametrized as: 

^{El + El - Ji - J2)(pi +P2) + ^{El - El - Ji + J2){pi - P2)cosh(2V') 

(6.125) 

One must integrate over the angular variables -0 and 0. The integral is; 



2i 



2 {Pi-P2)exp{ — {EI + E'^ - Ji - J2)(pi +P2)) 



X j di^dOsmh {2il))expC-^{El - E^ - Ji + J2)(pi - P2)cosh (27/;)) 

(6.126) 

It is convenient to change variables to x — cosh (2-0); x varies from one to 
infinity. Note that the ijj integral converges because the imaginary parts of E^ have 
signs (+1,-1). The final result is: 

«<(Pi - P2) .exp(i7Vrr(F(.B^ - J))) (6.127) 



iV(^S' - El - Ji + J2) 



I now turn to the angular integration over with integrand e^pi-tNTr{Qf E^ /O)- 
Since Qf is hermitian, it can be decomposed into a diagonal part and a unitary part, 
as mentioned above. There is a well-known formula for doing this integral, called the 
Harish-Chandra-ltzykson-Zuber formula because of three people (Harish-Chandra, 
Itzykson, and Zuber) who derived it. I will simply quote it as listed in Mehta's 
book [76]: 

. 1 , , 1,,^ c detiV 

dU exp{- — {A-UbU-^)) = 



2r A{a)A{b)' 
1 ' 

2r 



{t\N\j)=exp{--{a,-bj)) (6.128) 



U IS a unitary matrix; the integral is over all unitary matrices, and occurs when you 
factorize a matrix B into B = UbU^^ and then integrate over U before integrating 
over the eigenvalues in the diagonal matrix b. A is another Hermitian matrix, and 
bi are the eigenvalues of A and b, and A is the Vandermonde determinant A{x) = 
Yli^iYVj^ii^i ^ ^j)- ^ is a constant of one's choice, and c is a normalization 
constant which can be easily determined by making sure that the normalization 
equation 16.351 holds. For 2x2 matrices, c = 



TTt 

2 



Recall that the term we are integrating came from a sum of squares: 



eM-N/2TriiQf +iEf /sf)) 



(6.129) 
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I skip the details and give the final result. exp{±iNTr{Q^ /£_)) turns into: 

±^^f^^^eM±'-^Triq(E( + 44)) (6.130) 
N{Ei - E^) t. 

To derive this result I had to use the problem's symmetry under switching the values 
of qi and q2. Note that there is no difference between this result derived with unitary 
matrices and formula [6.1271 which was derived using the special "pseudo-unitary" 
matrices. 

Substituting eg uations l6.127l and 16.1301 into eauation l6.124l I obtain the follow- 
ing expression for the generating function in terms of integrals over the eigenvalues 
of Q-^ and of Q'^L. I use the symbols qi and q2 to represent the eigenvalues of Q-^, 
and the symbols pi and p2 to represent the eigenvalues of Q'^L. 

Z = ^'^N^'^iiN -l)l{N -2y.)-^2^{~l)'^TT-^ 

X exp{^Tr{EfEf/e)){E( - Ei)'\El - E^ - Ji + 

X / dqidq2 / dpidp2{qi ~ <?2)(pi -P2)(<7i + Pi) 

J J pi>0,p2<0 

X((?l +P2){q2 +Pl)(<?2 +P2)cxp(/:) 

C = -^{ql + ql)-'-^{qiEi+q2Ei) + {N ~2){\nqi+\nq2) 

- ^{p\+pD + Y^Pi{E\ - Ji)+P2{El - J2)) + (A^- 2)(lnpi +lnp2) 

(6.131) 

Now I change the sign of pi and p2- I also take the derivatives with respect to the 
source J, ignoring the derivative of the prefactor {E\ — E\ — Ji + J2) because 
it has a subleading order in N. Lastly, I set e{ = E\ = Ei, E^ = E\ = E2, and 
Ji = J2 = 0. 

Rar = ^^^ ^ {-!)'' N^'^+^iN-iy.iN- 2y.)-'2\-'cMYTriEVe))iEi-E2)- 
X / dqidq2 / dpidp2{qi - q2){pi - P2){qi - Pi) 

J J pi<0,p2>0 

{qi - P2){q2 - pi){q2 - P2)piP2 

X exp(-y (g? + g|) - ^(giSi + q2E2) + {N- 2){lnqi + In 92)) 



X exp(-y(p2 +ply'^(p^Ei +P2E2) + {N-2){lnpi+lnp2)) 



I will drop the factor of (—1)^ because I believe it is in error. 



(6.132) 



6.7.1.2 The Saddle Point Approximation 

Equation 16. 132l shou Id look familiar; the equations are almost the same as equation 
l6.114l from the calculation of the density of states, except that now there are two p 
variables and two q variables. Therefore, I can apply the saddle point approximation 
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exactly as explained in section 16.6.11 The requirements that pi must be negative 
andp2 positive imply that pi = —e^"^^ andp2 = e^"^^, leaving four fermionic saddle 
points. I will specify these saddle points with the indices si and S2. The integrand 
(other than the factor of piP2 coming from the derivatives with respect to the 
source J) is again antisymmetric, but now it is antisymmetric under interchange of 
any two variables. Substituting in eauations l6.115l and I6.117l and omitting factors 
of (1 — a), I obtain: 



Rar = N^^'+^iN ~1)\{N ~2)\)-'2'^-\~lfeM^Tr{&/e)){Ei-E2r^ 

EN 
exp( exp(i2si(/)i) + isiN Ei/ ^e-xp{isi(j)i) + {N — 2)ln(— si) 
Sl,S2 2 

N 

X exp( — -e-x^(i2s24>2) + iS2N E2 / £,eyiY){iS24>2) + {N - 2)ln(-S2) 

+ {N - 2)lS24>2) 

N 

X exp( — -eyiY>{i2(j)i) + iN Ei / £^eyi^{i(j)i) + {N - 2)i0i) 
N 

X exp(-yexp(-«202) - iN E2/ £,eyi]i{~i(t)2) - [N - 2)i(t)2) 

X j dpidp2dqidq2exp{—NqiCOS (j)iexp{isi(l)i))exp{— Nq^cos (t'2exp{iS24>2)) 

X exp(— A^p^cos ^iea;p(— i0i))exp(— iVp2C0S (t)2^xp{i(f)2)) 

X T, 

T = (-si X exp(zsi(?!)i) + S2 x exp(zs2(?!>2) + gi - g2)(-exp(i(/)i) + exp(-Z(?!)2) + pi - P2) 

X (-si X exp(isi</)i) +exp(«(/)i) + gi -pi)(-,si x exp(«si0i) - exp(-«02) + qi - P2) 

X (-S2 X exp(iS2</)2) +exp(«(/)i) + 92 -pi)(-S2 x exp(«S202) -exp(-«02) +92 - P2) 

X (-e^"^^ +pi)(e-''^^ +P2) (6.133) 

The way forward is straightforward but tedious: it consists of evaluating each 
of the four saddle points using techniques from perturbation theory. 

6.7.1.3 The -+ saddle point 

I define (/) = (0i +02)/2 to be the average of the two angles 0i and 02 , and 
(5 = 01 — (/)2 to be their difference. The integrand T simplifies to: 

(2cos0exp(— 2(5/2) + gi — q2)(— 2cos0exp(2(5/2) +pi — p2)(2cos0i + qi — pi) 
X (-2isin ((5/2)exp(-i(/)) + 171 - p2)(2isin ((5/2)exp(z0) + 52 -Pi)(-2cos02 +92 - P2) 
X (-e"^i +pi)(e-*^^ +P2) (6.134) 

In order to simplify the calculation, I will assume that the difference oj ^ Ei — E2 
between Ei and E2 is small, of order £,/N. Therefore S is also small, of order 1/A^, 
as is sin (5. I will do power counting, evaluating only the terms which are highest 
order in N, will drop factors of exp(z0), and will also freely interchange cos0i, 
cos 02, and cos0. 

First I note that the pie^*"^^ term in the last line gives rise to four leading 
order terms that look like ±4pjx^cos^0e~"^^ , where x represents either qi or p2. 
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However, to highest order in N all the cosines can be freely interchanged, as can 
all the x'^'s. The signs of the four terms add to zero, so these four terms can 
be neglected. Another four cos^ terms originating from the — ^26**^^ term can be 
neglected for the same reason. Similarly, the —6**^16""^^ term gives rise to twelve 
leading order cos'^ terms and eight leading order cos^sin terms, all of which cancel 
out. 

The remaining highest order terms are: 

16cos^0coS(^iCOS(/)2 X (p^p2 + 2iP?sin((5/2)exp(-i^)e""^= 

+2tplsm ((5/2)exp(«^)e''^i - 4sm^iS/2)e'^) (6.135) 

Integration over pi,p2,qi,q2 converts this to: 

IGn^N-^e^'^ x {1 + iNsm{S/2)cos<j}2exp{-i'$) 

+iNsm{S/2)cos(t)iexp{i(j)) - 4iV^sin^((5/2)cos 0icos (?!)2) 

(6.136) 

At this point it is helpful to remember that (j) is defined by s'mcf) = E/2^; 
differentiating, I obtain ^ — {2^cos(f)) . Remembering that sine/) w for small 
(f>, I find the simplified expression: 

16.2^-4(1 +,eos(0)^-(^)') (6.137) 
Substituting this result into the saddle point formula of eauation l6.133l I have: 



= 6AnN^''-'i{N-mN~2y.)-'eM^iE! + El)) 



X ^ '(- + Ucos(0)— - ( — ) ) 
N 

X exp(-yexp(-i2(/«i) - z7Vi;i/^exp(-i0i) - (iV - 2)«0i) 
N 

X exp(-yexp(i202) + tN E2 / ^exp{i,p2) + {N - 2)t(t>2) 
N 

X exp(-yexp(i20i) + iN Ei / ^e-Kp{i(j)i) + {N ~ 2)z0i) 
N 

X exp(-yexp(-z2</>2) - iNE2/^exp{-i(l)2) - {N - 2)«02) 

(6.138) 

The new notation means that that the — h saddle point of the Advanced- 
Retarded correlator evaluates to this expression. The {N — 2)i(f) phase factors can- 
cel, leaving the following: 



RJ+ = MnN^''-\{N-mN-2)\)-\M^{El+El)) 
X w 1^ +icos(0)— - ( — ) ) 



X exp(-7Vcos(2(/)i) - 2NEisiii(l)i/^ - iVcos(202) - 2NE2Sm(l)2/0 

(6.139) 
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Using again the Stirling approximation and the identity cos (20) + 2Esn\(f)/^ 
1 + this reduces to: 



R-J+ = 8c.-2(l + 2zcos(0)^-(^)') (6.140) 

6.7.1.4 The +- saddle point 
The integrand T simplifies to: 

(-2cos(/)exp(z(5/2) + qi ~ g2)(-2cos 0exp(z(5/2) + pi - P2)iqi - Pi) 
X (-2cos(/)exp(?,(5/2) + 171 - p2)(2cos <^exp(i(5/2) + ^2 -pi)(g2 - P2) 
X i-e'^^ +pi){e-'^^ +P2) (6.141) 

This fermionic saddle point matches the bosonic saddle point. As a consequence 
the first two lines of eauation l6.14ll taken together, have two antisymmetries: one 
under interchange of pi and qi, the other under interchange of p2 and ^2- Another 
consequence is that pi and q^ average to the same value, as do p^ and Therefore 
three terms in the factor (— 6*"^^ + pi)(e^"^^ + P2) are identically zero, leaving only 
the term piP2. The highest order term is: 

- 16p?p^cos'*0exp(2i(5) (6.142) 

Integration over pi , p2 , gi , ^2 converts this to: 

-ATi'^N-^exY>{ii5) (6.143) 

Substituting this result into the saddle point formula of equation 16.1331 and 
neglecting the factor of cxp(3«5), I have: 



R\-^ = -lQ^N''^-^u:-\{N-mN~2)\)-\M^{El+El)) 
N 

X exp(-yexp(i20i) + iN Ei/S^eyip{i(j)i) + {N ~ 2.)i(j)i) 
N 

X exp(-yexp(-22<?!)2) - iN E2 1 S,eyip[~i(l)2) - [N - 2)i<j)2) 
N 

X exp(-yexp(«20i) + iN Ei / ^exp{i(j)i) + {N - 2)i(j)i) 
N 

X exp(-yexp(-i2(?!)2) - iNE2/^exp{~i<f)2) - {N ^ 2)1^2) 

(6.144) 

This time the {N — 2)i(f) phase factors do not cancel, leading to the following 
expression after application of = 2^sin0 and some trigonometric identities: 



X exp(22(7V - 2)5 -2N + iN sin (20i) - iN sin (2(/)2)) 

(6.145) 
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If I throw away terms of order 1/A^, 2t{N - 2)5 + tNsm{2(f>i) - iN sin {2(j)2) 
simplifies to 2iN5{l + cos 0) = AiNScos^cf). Using again the Stirling approximation 
and the identity lo k, 2S^5cos(f), I obtain the final expression: 

R\'^ = -8w-^exp(2iA^w^-icos0) 

= -8tj"^(l - 2s\\y^{NujC^cos'4>) + 2isin(A^tj^~^cos0)cos(iVw^~^cos0)) 

(6.146) 

6.7.1.5 The ++ saddle point 
The integrand T simplifies to: 

(-2isin ((5/2)exp(i(/)) + qi - q2)(-2cos0exp(«J/2) + pi - p2){qi - Pi) 
X (-2cos(?!>exp(i5/2) + qi - p2)(22sin (5/2)exp(^0) + (j2 - j5i)(-2cos(/i2 +92 ^ P2) 
X (-e*^^ +|3i)(e-"^^ +P2) (6.147) 

This saddle point retains an antisymmetry in the pi and qi variables, so the last 
line of eauation l6'.147l reduces to di fe~"^^ +P2). The highest order term is: 

^Pl{ql ~ (7?)cos^0cos02exp(«(5 - ^2)) (6.148) 

After integration, cancellation between q^ and q^ adds another factor of 
which causes this term to be of order N~^; I will neglect it in the remaining 
calculations. 

6.7.1.6 The - - saddle point 
The integrand T simplifies to: 

(-2isin ((5/2)exp(-i(/)) + qi - (72)(-2coS(^exp(i(^) +pi - p2){2coscj)i + qi - pi) 
X (-2isin ((5/2)exp(-i(/)) + qi - p2)(2cos (/)exp(z0) + 92 - Pi){q2 - P2) 
X i-e'"^^ +pi){e-"^' +P2) (6.149) 

This saddle point retains an antisymmetry in the p2 and 172 variables, so the last 
line in equation 16.1491 reduces to p2(— e"^^ +Pi)- The highest order term is: 

8^2(^2 — (Zi)cos^0cos 0iexp(— i^i) (6.150) 

Again there is an extra cancellation which renders this saddle point neglegible. 

6.7.1.7 The Total 

Adding together the results of the saddle points; i.e. equations 16. 1461 and 16.1401 I 
obtain the final result for the advanced-retarded correlator: 

Rar = Su-^{l + 2icos{<i))— -{ — ) ) 

- 8w"^(l - 2sin^(iVw^"^cos^) + 2zsin(iVw^"^cos^)cos(7Vu;^"^cos^)) 

(6.151) 
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6.7.2 The Advanced- Advanced Correlator 

Unfortunately I was unable to take the time to get all the details of this calculation 
right before the thesis was due. In particular, once I get to evaluating the four 
saddle points, the errors become very frequent. However, I believe that the errors 
are only sign errors, and the final result confirms that. 

Because both Green's functions are advanced, equation 16.241 implies that: 

Ln = -1,L22 = -1 (6.152) 

Remember that L is diagonal. The only difference between this calculation and 
the previous Advanced-Retarded calculation is that in this case the matrix L is 
proportional to the identity. 

When one chooses V = \, = P = 2, L -as in equation 16.1521 flips the sign 
of , and again makes the shift Qf + i& jf^, equation I6.61l turns 

into: 

Z ^ J [ dQf dQ^exp{C) 

C = -^Tr{Qf^)-iNTr{QfEf/^) + {N-2)Tr\nQf 

- ^Tr{Q'>Q'')+iNTr{Q\E'' - J)/0 + {N - 2)(Tr In {Q')) 

+ Tr\n{Qf + Q^) (6.153) 

Only formal difference between equation I6.153l and equation 16.1241 is the fact that 

L = -1. 

I use the symbols qi and q2 to represent the eigenvalues of Qf , and the symbols 
Pi and p2 to represent the eigenvalues of . 

Z = e^A^^^((iV-l)!(iV-2)!)"^227r-i 

X eM^Tr{EfEf/^2^)i^Ei - Ei)~\E\ - E^ - J, + J^)"' 

X / dqidq2 / dpidp2{qi - q2){p\ - P2){q_\ +P1) 

J J pi<0,p2<0 

X((7l +P2)(g2 +Pl)((72 +?32)exp (£) 
>C = -y(9?+g^)-y(<Zi£;( + g2-B|) + (A^-2)(lngi+lng2) 

- f (P? +P2) + y (Pi(i^i' - Ji)+P2{E'2 - J2)) + {N- 2)(lnpi + \np2) 

(6.154) 

Now I change the sign of pi and p2. I also take the derivatives with respect to the 
source J, ignoring the derivative of the prefactor [Ef — i?2 — Ji + J2) because 
it has a subleading order in N. Lastly, I set e( ^ E^ ^ Ei, e{ ^ E^^ E2, and 
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Ji = J2 = 0. 

AT 

Rar^^^^ ^ N^^'+^iN -mN ^2)\r'2^n-^eM-l^Tr{E^ie)){E^-E,)-^ 
X / dqidq2 / dpidp2{qi - q2){pi ~ P2){qi - Pi) 

J J pi>0,p2>0 

(<?1 -P2){q2 -Pl){q2 ~P2)PlP2 

X exp(-y (g2 + ql)-'Jl[q^Ei+ q2E2) + {N - 2)(ln(?i + Inga)) 

X exp(-y (p2 + pjy'^^p^Ei +P2E2) + {N~ 2)(lnpi + Inps)) 

(6.155) 

I can apply the saddle point approximation exactly as explained in section l6.6.1l 
The requirement that both pi and p2 must be positive implies that pi = e^^^i 
and p2 — e~"^^, leaving four fermionic saddle points. I will specify these saddle 
points with the indices si and S2- The integrand (other than the factor of pip2 
coming from the derivatives with respect to the source J) is again antisymmetric, 
but now it is antisymmetric under interchange of any two variables. Substituting in 
equations 16. 1151 and 16. 1171 and omitting factors of (1 — a), I obtain; 

Raa = N'^+'iiN-iy.{N~2y.r'2'n~'eMY'^riEye))iEi-E2r^ 

EN 
exp(-— exp(z2si0i) + isiNEi/^ex.p{tsi(t)i) + {N - 2)ln(-si) + (N - 2)isi0i) 
Sl,S2 2 

N 

X exp(-yexp(i2s2<?!'2) + zs2iVi;2/^exp(iS202) + (N ~ 2)ln(-S2) + (N ~ 2)iS2(f>2) 
N 

X exp(-yexp(i20i) + iNEi/^e^p{t<pi) + {N - 2)t(f>i) 
N 

X exp(-yexp(i202) + iNE2/^e^p{t<p2) + {N - 2)t(f>2) 

dpidp2dqidq2exp{—Nq\cos (l)iexp{isi(j)i))exp{— N q^cos 4>2Gxp{iS24>2)) 

X exp(— A^p^cos 0iea;p(— Z(/)i))exp(— A^pjcos 02exp(— 1^2)) 

X T, 

T = (-si X e-x:i>{isi(j)i) + S2 x exp(2S202) + 9i - g2)(exp(-Z(?!)i) - exp(-i02) +Pi - P2) 

X (-si X exp(isi0i) - exp(-«(/)i) + qi -_pi)(-si x exp(«si(/)i) - exp(-i(?!)2) + qi - P2) 

X (-S2 X exp(iS202) - exp(-«0i) + 92 -l5i)(-.S2 X exp(«S2'/'2) - exp(-i(?!)2) + 92 - ^2) 

X (e'-^i +pi)(e-*^^ +P2) (6.156) 

6.7.2.1 The + + saddle point 
The integrand T simplifies to: 

(-2zsin ((5/2)exp(«0) + qi ~ q2){-2isuv{5 /2)exp{-i(t>) + pi -p2)(-2cos(/>i + gi - pi) 
X (-2cos(?!<exp(i5/2) + qi - p2)(-2cos 0exp(-iJ/2) + q2 -pi)(-2cos02 + 92 - P2) 
X (e'-^i +pi)(e"^^ +P2) (6.157) 
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Just the same as in the case of the advanced-retarded correlator's — h saddle 
point, at the highest order in N there are twenty cos^ terms and eight cos'^ sin 
terms which all cancel to highest order in N . The remaining highest order terms 
are: 

32i{e'"^^pl - e*'*ip2)cos^0cos0icos02sin((5/2)exp (-Z0) - 64sin^((5/2)cos^0cos 0icos 02 

(6.158) 

The first term is of subleading order and will be neglected. Integration over 
Pi,P2,qi,q2 converts the second term to: 

-647^2^^-2g2^0gJ^^2(^^^2)cos0lCOS02 (6.159) 
Again simplifying with ^ = (2^cos0)~^, I obtain: 



Substituting this into equation 16. 1561 gives: 
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exp(- 
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exp(- 
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T 


exp(- 


N 


T 



(z202) + iNE2/^cicpii(t,2) + iN~ 2)i02) (6.161) 

Using Stirling's formula, et cetera, results in the final expression: 

Raa = -SN^C^e^''^ 

= -8iV^C"^(2cos^0 - 1 + 2«cos0sin0) (6.162) 

We will see that this is the only leading order contribution to Raa- 

6.7.2.2 The - - saddle point 
The integrand T simplifies to: 

(-2«sin((5/2)exp(-i0) + qi - q2){-2isin (5/2)exp(-20) + pi - P2){qi - Pi) 
X (-2isin((5/2)cxp(-i0) + qi - p2)(2«sin (5/2)exp(-z0) + q2 - Pi){q2 - P2) 
X (e*^i +pi)(e"^^ +P2) (6.163) 
This saddle point is obviously neglegible. 

6.7.2.3 The +- saddle point 

The integrand T simplifies to: 

(-2cos0exp(2(5/2) + gi - g2)(-2isin ((5/2)exp(20) +pi - P2){qi - Pi) 
X (-2jsin(5/2)exp(j0) +171 - p2)(2cos 0exp(2(5/2) + 52 -Pi)(2cos02 + 92 - P2) 
X (e^^i +pi)(e"^^ +P2) (6.164) 
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This saddle point retains an antisymmetry in the pi and qi variables, so the last 
line reduces to 6*"^^ + P2)- The highest order term is: 

8p?(gi - P2)cos^(?;>cos02exp(i02) (6.165) 
This saddle point is not of leading order and will be neglected. 



6.7.2.4 The -+ saddle point 
The integrand T simplifies to: 

(2cos(/)exp(— 2(5/2) + qi — q2){—2is'm (5/2)exp(z0) + pi — p2)(2cos(/)i + qi — 
X (2cos(/)exp(-2(5/2) + qi - p2)(2isin (5/2)exp(z0) + q2 - Pi)iq2 - P2) 
X (e"^i +pi)(e"^^ +P2) (6. 

This saddle point retains an antisymmetry in the P2,q2 variables, so the last line 
reduces to p2(— e*"^^ +Pi)- The highest order term is: 

8pl{ql — p^)cos^0cos0iexp(«0i) (6.167) 

This saddle point is not of leading order and will be neglected. 



6.7.3 The Final Result for the Two Point Correlator 

I am finally ready to calculate the two point correlator, given by eauation l6.12l 

R(E E)= ^e(-^Afl) - R<Raa) 
87r2 p{E,) p{E2) 

Equation l6.162l for the Advanced-Advanced correlator gives me: 

_ Re{RAA) _ 2N^cos^ 



(6.168) 



I have used eauation l6.6l for the level density in the second step. 

Equation 16. 1511 for the Advanced-Retarded Correlator, with equation 16.61 gives 

me: 

Re{RAR) , I , 



Sn^p"^ V2tj2p2 T,2^2p2J 

- i^-'-^^^P^) (6.169) 

The terms in the first parenthesis came from the — h saddle point, while the terms 
in the second parenthesis came from the H — saddle point. 

The sign of the Advanced- Advanced correlator seems wrong, so I change the 
sign, in keeping with my sloppiness earlier, particularly while evaluating the saddle 
points of the Advanced- Advanced correlator. 
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Combining these results, I obtain: 

sin^ {■Kufp) , 

i?2(i?i,i?2) = -2(l (6.170) 

The correct result was given in eauation l6.8l 

I already explained that the 5 function in the correct result does not have any 
real physical significance. I haven't tracked down exactly why it doesn't appear in 
this analysis, but I suspect that somewhere I assumed that E is invertible. There is 
also an error of an overall factor of —2, plus the fact that I had to change the sign 
of one of the contributions, plus the fact that I dropped factors of (—1)^ at various 
points, but I believe that these problems are due to my sloppiness and mistakes, 
not to deficiencies in the formalism. 
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Chapter 7 

0{N) Algorithms 



7.1 Basic Concepts 

Most 0{N) algorithms are really very simple to understand. Here I review their three 
basic concepts: a localized basis, basis truncation, and generalized multiplication. 

7.1.1 A Localized Basis 

All 0{N) methods use a localized basis, and assume that the system is best de- 
scribed in terms of this basis. I here define a basis as localized if for any basis 
element \tp), only a small number of positions x satisfy {x\ip) 7^ 0. In particular, 
plane wave bases are excluded. Note that crystal calculations are still possible, by 
using a free propagator that properly includes the crystal lattice structure. The 
theory of such propagators, called KKR band theory, was developed by Korringa, 
Kohn, and Rostoker [31,32]. 

In tandem with the localized basis, all 0{N) algorithms require a distance metric 
for quantifying the physical distance between any two basis states. 

7.1.2 Basis Truncation 

Basis truncation is a five syllable name for a very simple minded operation. One 
simply chooses to throw out each basis state that is far from some point x. Typically 
"far from" means that the distance \y — x\ between the basis state y and the point 
X is larger than a truncation radius R. After the truncation, one is left with only 
~ basis states, where D is the dimension. This number does not depend on 
the total basis size N at all. 

The basis truncation is, of course, accompanied by corresponding truncations of 
all matrices. 

Basis truncation algorithms generally pick 0{N) different points distributed 
throughout the system, generate a truncated basis at each of those points, and 
then do the same computation in each of the truncated bases. Since computations 
in any single basis require only 0(1) time, the total time requirement is of order 

0(iV). 

In other words, this approach basically divides the system into pieces, and com- 
putes them separately. 
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7.1.3 Generalized Multiplication 

In this approach one avoids dividing the system into pieces, but instead changes the 
rules for doing multiplication. Ordinarily computing the multiple AB = C of two 
N X N matrices A and B requires 0{N^) time. One avoids this cost by truncating 
matrix elements from A, B, and C. The usual rule is that a matrix element (.T|A|y) 
is set to zero if \x — y\ > R, where i? is a truncation radius. The reader can easily 
check that if this truncation rule is applied to A and B then their multiple C = AB 
can be evaluated in 0{N) time. However C may end up having non-zero matrix 
elements whose distance away from the diagonal is greater than R. In order to 
avoid a gradual widening of matrices as more and more multiplications are done, 
one therefore truncates C after the multiplication, cutting it down to reside entirely 
with the truncation radius. 

This generalized multiplication can be written down mathematically with the 
following formula: 

= e(|£, ~x,\~ R)J2^A,kBkMW^ - - R)e{\xk -xj\~ R) (7.1) 

The vectors Xi, Xj, and Xk are just the spatial positions of the basis states i, j, and 

k. 

Generalized multiplication can be understood with much more mathematical 
precision as a tensor with six indices. Formula [7. II can be rewritten in terms of the 
generalized multiplication tensor Mabcdef as: 

Cij — ^ ^ Mjjcdef AcdBf,f 
cdef 

Mabcdef = Sic6deSfjQ{\Xi-Xj\-R)Q{\x^-Xd\-R)Qi\xd-Xj\-R) 

(7.2) 

Ordinary multiplication corresponds to choosing a specific tensor Mabcdef = SicSde^fj- 

The tensor notation is superior because the whole machinery of linear algebra 
can be applied to tensors: they have eigenvalues, eigen-tensors, singular value de- 
compositions, et cetera. Therefore one can do a rigorous mathematical analysis of 
the behavior of 0{N) algorithms: their convergence, numerical stability, etc. One 
of the most important questions is of course understanding the errors induced by 
using generalized multiplication, which will require understanding how the eigenval- 
ues and eigen-tensors of the generalized multiplication tensor M relate to those of 
the normal multiplication tensor M. Unfortunately nobody seems to have started 
doing this analysis yet. 

In some algorithms the multiplications may always have the same matrix B on 
the right hand side of the multiplication. In this case things simplify a bit, and 
instead of worring about the six-index tensor M one is instead concerned about 
the four-index tensor MB. In this case one becomes concerned about the how the 
eigenvalues and eigen-tensors of MB relate to those of MB. 

This completes the basic concepts of 0{N) algorithms. They're really not 
difficult. 

7.2 Catalog of 0{N) algorithms 

I now give a catalog of all the 0{N) algorithms which I have seen published in the 
literature, plus one more of my own. 
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7.2.1 Basis Trucation Algorithms 

There are three basis truncation algorithms: Yang's Divide and Conquer algo- 
rithm [1], the "Locally Self-Consistent Multiple Scattering" algorithm [77], and 
Goedecker's "Chebychev Fermi Operator Expansion" [35,36], which I will hence- 
forth call the Goedecker algorithm. Basis truncation algorithms break the matrix 
function into spatially separated pieces. Given the position of a particular piece, the 
basis is truncated as described previously in section r7.1.2l and then the matrix func- 
tion is calculated within the truncated basis. Thus, for any generic matrix function 
f{H), a basis truncation algorithm calculates {x\f{H)\y) ~ {x\f {Pg^ffH Ps^g)\y) , 
where Pgg is a projection operator truncating all basis elements far from x, y. 
There may be also an additional step of interpolating results obtained with different 
P's, but I will not discuss this complication here. 

The three basis truncation algorithms are distinguished only by their choice of 
how to evaluate the function /{Pg gHPg jj). Here are the specifics: 

• Yang's algorithm calculates / by diagonalizing the truncated argument PHP: 
PHP — UDU^ , where D is the diagonal matrix of eigenvalues and U \s a 
tranformation matrix built from eigenvectors. Yang's algorithm then uses the 
standard formula f{H) ~ Uf{D)U'^ to obtain its final result. 

• The Goedecker algorithm does a Chebyshev expansion of /. As long as all 
the eigenvalues of the argument H are between 1 and —1, a matrix func- 
tion may be expanded in a series of Chebyshev polynomials of H: f{H) = 
J2s=o'^^'^^(^y The coefficients Cg are independent of the basis size, and 
therefore can be calculated numerically in the scalar case. The Chebyshev 
polynomials can be calculated in 0{N) time using the recursion relation 
Ts+i = [2HTs) - Ts-i, Ti = H,To ^ 1. (Of course, one must also bound 
the highest and lowest eigenvalues of H and then normalize. In practice, very 
simple heuristics are sufficient for estimating these bounds.) If the matrix 
function f{H) has a characteristic scale of variation a, then the error induced 
by the Chebyshev expansion is controlled by an exponential with argument of 
order ~aS. 

• The "Locally Self-Consistent Multiple Scattering" algorithm calculates the 
argument's resolvent or Green's function G{E) = [E — H) ^ within the trun- 
cated basis. It then obtains / via contour integration: 
f{H) = (27rz)^^ § dEJ{E)G{E). The contour must include the entire spec- 
trum of H unless f{E) is zero over some portion of the complex plane. 

Because these algorithms are all mathematically equivalent when applied to 
analytic functions, they should all converge to identical results, as long as one 
makes identical choices of which matrix function to evaluate, of how to break up 
the function, of which projection operator to use, and of a possible interpolation 
scheme. Moreover, given an identical choice of matrix function, variations in the 
other choices should obtain results that are qualitatively the same. 

No doubt other 0{N) algorithms could be invented simply by choosing one's 
favorite algorithm for calculating functions of small matrices and combining it with 
basis truncation. I discussed only these three because they are the only ones that 
have been discussed in the 0{N) literature. 
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7.2.2 The Locally Self-Consistent Green's Function Algo- 
rithm 

This algorithm [33,34] is almost exactly the same as the "Locally Self-Consistent 
Multiple Scattering" algorithm; it too computes the Green's function in a truncated 
basis and then does a contour integral to obtain the desired matrix function. There 
is, however, one significant difference: it adds some information about the long- 
distance physics to the truncated calculation. Specifically, this algorithm uses the 
Coherent Potential Approximation to estimate the self-energy S of the average 
Green's function, which is a way of describing the decay which disorder induces in 
the Green's function. This algorithm then uses the formalism of scattering theory, 
which I will discuss in section 17.3.11 to insert the self-energy S into the calculation 
of the Green's function within the truncated basis. The insertion is done in way 
that would not influence the end result at all if there were no basis truncation and 
the system were infinitely big. However in a truncated basis it can make a large 
difference, allowing the algorithm to feel the average influence of sites far outside 
of the truncation radius. The net result is that one can choose a much smaller 
truncation radius to obtain a given accuracy. 

7.2.3 Functional Minimization 

Now I turn to algorithms which are based on generalized multiplication. The idea 
is to create a functional of the matrix X whose minimum corresponds to X hav- 
ing the value equal to the desired matrix function. One evaluates the functional 
using generalized multiplication, and then with the functional in hand one uses a 
minimization algorithm, for instance the conjugate gradient algorithm, to find the 
minimum of the functional. By construction one thus arrives at the desired function. 

Note that these functionals, and in particular their minima, are designed in the 
world of normal multiplication. Once generalized multiplication is introduced, any 
particular minimum could change, disappear, or become multiple minima. This 
difficulty is generally just accepted as one of the prices of 0{N) performance, and 
the details of how the minima are affected by generalized multiplication have not 
been discussed in much detail. 

The 0{N) literature has concentrated on algorithms for computing the density 
matrix function, or step function, p{H) — Q{H — /i). The scalar /i is called the 
Fermi Energy, and just acts as on offset to the energy. Several functionals suitable 
for calculating the density matrix have been discussed in a lot of detail in the 0{N) 
literature; here I only present one. In 1993 Li, Nunes, and Vanderbilt [78] introduced 
the functional Z = Tr{{H - n){5F'^ - 2F^)). The Li-Nunes-Vanderbilt functional 
is basically just a cubic polynomial with extrema at and 1. The factor H — 
determines the overall sign of the polynomial and therefore determines whether a 
particular eigenvalue of F will prefer to be or 1. After converging to the minimum, 
F will have the same eigenvectors as H. F's eigenvalues /, however, will be different 
than the eigenvalues e of H, and will be given by the relation /(e) = 8(e — /i). In 
other words, F will converge to be equal to the density matrix function. 

There is, however, a proviso: the initial value of F must have all its eigenvalues 
within a certain interval; otherwise F will diverge. This is caused by the fact that 
the minima at and 1 are local minima, not global minima. 

Now for my new algorithm, designed for evaluating the matrix logarithm: min- 
imize the functional Z — Tr{exp{HF) — F). This functional has a single global 
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minimum at F = \nH. I showed numerically in chapter |4l that basis truncation 
algorithms can evaluate the matrix exponential in disordered systems with extraordi- 
nary accuracy; perhaps this minimization algorithm could achieve a similar accuracy 
for the logarithm? 

In section 1731 1 will give another functional minimization algorithm, this one for 
computing the Green's function. 

7.2.4 Iterative Algorithms 

Consider applying the mapping x — 2x^ over and over to a number. It is 

easy to prove (see Falser and Manolopoulos's paper [79]) that if the initial value xq 
is in the interval ^ < < 1, then x will converge to 1. If, on the other hand, the 
initial value is in the interval < xo < |, then x will converge to 0. In either case 
the convergence will be quadratic, meaning that the number of accurate digits in x 
will double at every step. This is called McWeeny purification. 

I restate this result: if we restrict the initial value to be in the interval < < 1, 
then the iteration converges quadratically to a value equal to 0(i — xo). Thus I 
have an iterative way of computing the step function. 

This algorithm can be easily generalized to compute the density matrix [79]. One 
starts by normalizing the matrix H — i^i so that all its eigenvalues lie between and 
1: F — ^ — Jxi^ ^ lA' where the normalization constant A is equal either to the 
absolute value of the largest eigenvalue of H—fi or the absolute volue of the smallest 
eigenvalue of — /z, whichever is larger. Having performed this normalization, one 
just iterates: F 3F^ — 2F^. The result converges quadratically to the density 
matrix. 

This algorithm is probably the only case where I have seen the effects of gen- 
eralized truncation studied in any kind of mathematical detail. The study was 
done by Bowler and Gillan [37], who concluded that it is best to start by applying 
this iteration, and then at some point polish up the final result by minimizing the 
Li-Nunes-Vanderbilt functional. 

In section lT^l l will propose another iterative algorithm appropriate for computing 
the Green's function. 

7.2.5 Chebyshev Expansion and Green's Function Algo- 
rithms 

Two of the basis truncation algorithms can be easily adapted to use generalized 
multiplication instead of basis truncation. Both algorithms have the advantage of 
being general purpose algorithms able to calculate many different matrix functions. 
One of these algorithms is the Chebyshev expansion. The other is the approach 
of computing matrix functions using the formula f{H) ~ {2Tn) ^ § dEf{E)G{E). 
This last algorithm requires an 0{N) algorithm for calculating the Green's function; 
I will discuss many candidate Green's function algorithms in section FTSl and chapter 

m 

7.2.6 Bond Order Potential Algorithms 

There is also a class of algorithms - called Bond Order Potential algorithms - which 
are based on the Lanczos algorithm for tridiagonalizing a matrix. One feeds the 
tridiagonal matrix generated by the Lanczos algorithm into a recursive equation 
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and obtains diagonal elements of the Green's function {E — H) . One cuts off 
the recursion after some number terms, which is why this is an 0{N) algorithm. It 
is also possible to obtain off-diagonal elements, but the algebra isn't pretty and the 
numerics are quite unstable. Still, at least one numerical study [80] indicates that 
a Bond Order Potential algorithm may be more suitable to certain problems than 
some of the other 0{N) algorithms I have discussed. 

I don't understand the mathematical details of obtaining the off-diagonal ele- 
ments, and will just refer the reader to several papers on the subject [80-83]. 

7.2.7 Stochastic algorithms 

It may also be possible to use stochastic methods to obtain rough estimates of matrix 
functions. In a recent paper, Buchmann and Petersen [84] showed that if you evolve 
a vector x under a force —Ax + rj where A is symmetric and positive definite and rj 
is a noise term, then the average value of the outer product x ® x will converge to 
\A^^ . I suspect that this particular algorithm may not be very efFicient compared 
to other algorithms for the inverse, and probably scales worse than 0{N). However, 
it does give some hope that other faster stochastic algorithms could be developed. 
There are also stochastic methods of estimating individual matrix elements of matrix 
functions; if you were to find one that takes 0(1) time per matrix element and does 
not require calculation of matrix elements far from the diagonal, then you would 
have an 0{N) algorithm. 

7.2.8 Other Algorithms 

If you just want to calculate some matrix elements of a matrix function (for instance 
its diagonal elements or its trace), or you want to calculate the matrix function times 
a vector {f{H)b), then there is a wide variety of algorithms specially designed for 
these tasks. Some of these algorithms are stochastic. I do not review any of them 
here. 

This completes my review of existing 0{N) algorithms. For more depth, I refer 
the reader to these reviews [2,18-21,80]. 

7.3 Calculating the Green's Function 

In chapter|4]l showed that basis truncation algorithms perform very poorly in calcu- 
lations of the real part of the Green's function. The 0{N) literature contains only 
two other suggestions for calculating the Green's function: the Bond Order Potential 
algorithms [80,81,85] (which are based on a Lanczos approach and have signifi- 
cant problems with numerical stability), and the "Locally Self-Consistent Green's 
Function" algorithm [33,34]. This last algorithm requires as input an average of 
the Green's function over all possible disorder configurations. This average is not 
always easy to compute. Both approaches have significant limitations. 

Here I present many alternative algorithms for computing the Green's function 
in 0{N) time. In one sense they are all due to me. In another sense, some of them 
are well known 0{N^) formulas and the only new thing I have done is propose to 
reduce them to 0{N) by using generalized multiplication. Other formulas are not 
so well known: I have never seen them before. I will try to clearly signal what I 
have and have not seen before. In any case, all of the formulas are pretty simple 
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minded, at least from the point of view of someone who i<nows scattering theory. 
Since scattering theory is required for understanding some (but not all) of these 
algorithms, I will now spend a little time explaining the necessary basics. 

7.3.1 Introduction to Scattering Theory 

Scattering theory is just a particular approach to calculating the Green's function 
G{E) = {E — H)~^ , where the energy i? is a complex number and the Hamiltonian 
H \s a matrix. I will explain the particulars of this approach shortly, but first a few 
words about the basis. 

Although the equations of scattering theory do not require any particular choice 
of basis, it is conventional in scattering theory to assume that each basis state is 
physically located at a particular site on a lattice representing the physical system. 
At each site there may be one or more basis states; each basis state is specified by 
two indexes: \vi), where v is the site index and i specifies the i-th state at site v. 
Matrix elements connecting two sites are called hopping matrix elements. Matrices 
without any hopping matrix elements - i.e. ones which are diagonal in the site index 
- are called local. 

The key idea of scattering theory is to decompose the Hamiltonian H into two 
parts H = Hq + V: one part Hq whose inverse is trivial to calculate, and another 
part V whose inverse is more complicated. The trivial part is usually called the 
"free" Hamiltonian, while the nontrivial part is often called the potential, especially 
if it is local. Because i?o's solutions are exactly known, a "free propagator" (or 
"free Green's function") may be defined: Go{E) = [E — Hq)~^. I can now rewrite 
the Green's function in terms of a combination of the free propagator and V: 

G{E) = {G^\E) + Vf' (7.3) 

This equation is starting point of scattering theory; the details are all in figuring out 
how V influences G{E). The advantage of scattering theory lies exactly here, in the 
fact that one does as much as possible exactly, by calculating Go- The non-exact 
steps, the approximations, are reserved to handle V. 

What decomposition of H into Hq and V should one choose? This decision is 
determined by the hopping matrix elements; the matrix elements of H which are 
not diagonal in the site index v. If they are irregular and can not be easily inverted, 
then one must choose Hq to be the part of H which is diagonal in the site index, 
because then Go will also be diagonal in the site index and can be computed very 
quickly. However, often the hopping matrix elements of H are regular and thus 
easily inverted. This happens with partial difference equations; for instance the 
only non-local term in the Schrodinger equation is a second derivative representing 
the kinetic energy, and is spatially uniform. In crystals as well, the hopping matrix 
elements are regular, and one uses KKR band theory [31,32,34,86,87]. In such 
cases Ho should be chosen to contain all hopping matrix elements, and thus the 
bare propagator Go will give a complete description of propagation between sites. 
The remaining irregular part V will be diagonal in the site index; thus one obtains a 
picture of particles moving between sites via Gq and scattering off of the potential 
V which exists at individual sites. 

Some of the following algorithms will make an intermediate step in the process 
of calculating G{E) and calculate the scattering matrix T = (1 — VGo)~^V first. 
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Once one has computed the T matrix, the Green's function G is trivial: 

G = Gq + GqTGo (7.4) 

Of course, the free propagator can be expected to extend throughout the entire 
system, so the multiplications in the last term of equation 17.41 require 0{N^) time. 
However, if one substitutes the normal multiplications with generalized multiplica- 
tions as defined in section l7.1.3l then the multiplications take only 0{N) time. This 
substitution is perhaps reasonable in disordered systems, where one expects G{E) 
to die off exponentially away from the diagonal. 

What is the physical significance of the scattering matrix T? It sums up the 
physics of scattering; V\ip) = where \(p) is the unscattered wave, and \ip) 

is the full solution to the scattering problem. Bound states are represented non- 
perturbatively in the T matrix via the inverted matrix (1 — VGo)~^ . One may ex- 
pand that quantity perturbatively to obtain the Born approximations; for instance, 
in the lowest order Born approximation T ~ V + VGqV. Of course in this pertur- 
bative expansion all bound states are lost. 

The reader who has already studied scattering theory should be warned that 
many of scattering theory results and ideas that she has learned in the context of 
continuum scattering are not applicable to the discrete systems studied here. In 
the continuum typically one uses the language of phase shifts and spherical plane 
waves, which are best suited to rotationally invariant systems, not to lattices. Often 
there is also a focus on the perturbative Born expansion, which is not appropriate 
for studying bound states. In contrast, I use a discrete basis representing a lattice, 
and never make a perturbative expansion. Neither the language of phase shifts nor 
many of the standard scattering equations are applicable. 

Computation of the total scattering matrix T always begins by calculating the 
scattering behavior of each site individually. I partition the potential V site-wise, 
and define Vy as the portion of V which is located at site v. (If the potential V 
is not local, any nearly local partitioning scheme will do.) I next define a site-wise 
scattering matrix: 

T„ EE (1 - VyGoyWy (7.5) 

The matrix elements {wi\Vv\xj) are zero unless the site indices w, x, and v all have 
the same value. One can easily see that the site-wise scattering matrix Ty has the 
same property. Ty can be computed by doing mathematical operations using only 
the basis elements at site v. This calculation is straightforward and I will not spend 
more time on it. 

After computing the site-wise scattering mattrices Ty, there always follows a 
process of using Ty and Go to calculate the total scattering matrix T . The details 
of this process vary from algorithm to algorithm, and I will explain them as I come 
to them. 

This completes my introduction to scattering theory. Now for the first class of 
0{N) algorithms for computing the Green's function: summation formulae. 

7.3.2 Summation Formulae 

The basic idea of this section is to calculate either the total Green's function G or 
the total scattering matrix T in increments, adding one lattice site at a time. So 
one starts with the Green's function of only one site, and then adds another site 
to get the Green's function of two sites. Adding another site, and another, one 
eventually obtains the Green's function (or scattering matrix) of the whole system. 
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7.3.2.1 Scattering Matrix Summation 

First, the iterative summation of site-wise Ti matrices. The idea is to start by finding 
the total scattering matrix for two sites v and w; I will call this T^. is not a 
simple sum of Ty and T^, because of the possibility of scattering off of first site v 
and then site w. Nonetheless a sort of addition is possible and I will shortly present 
the correct formula. Having added two sites, it is easy to add another site by simply 
considering v and w as a single site. Thus I can use the same addition formula to 
add a third site Tx to T^, obtaining which represents the total scattering of sites 
V, w, and x. This process can proceed ad infinitum until I have added all the sites 
to get the total scattering matrix T. 

I will skip the details of proving the T matrix addition formula, but will simply 
state the final result. The proof does rely on the fact that any T matrix representing 
the total scattering behavior of several sites may be decomposed in terms of first and 
last scatterers: T = w "^vw, where each term T-u^, descibes the scattering which 
begins at site v and ends at site w. I define some new notation before presenting 
the formula: T" is the scattering matrix of n sites taken together. is the 

scattering matrix of all n sites plus another another site x, all taken together. 
is the projection operator which contains only the states at site v; when it occurs 
it restricts sums over states to only the states at v. 

I now give the summation formula. If V is local, the T matrix addition formula 

is: 

= T" + (1 + r"Go)P.(l - r,GoT"Go)"'7;(l + CqT") 

(7.6) 

The generalization to non-local potentials V is straightforward. There is also 
an easy generalization to adding several sites at a time. 

The interpretation of formula [7.61 can be obtained by expanding the inverted 
term in a power series and then reading individual terms in the series either from 
right to left or from left to right. In particular, the inverted term represents the 
possibility of repeated scattering back and forth between T" and T^. 

The computation of the inverted term may seem to threaten an 0{N^) cost, but 
in fact this term is local to site x, so the inversion occurs in the x-th site's basis, and 
requires very little time. The real problem comes from the matrix multiplications, 
which require 0{N^) time unless one uses generalized multiplication as defined 
in section 17.1.31 In this latter case, adding a site requires only 0(1) time, and 
calculating the total scattering matrix requires 0{N) time. The Green's function 
can be calculated in turn in only 0{N) time using equation 17.41 This completes 
my presentation of the iterative summation algorithm for calculating the Green's 
function. 

This algorithm has an advantage over basis truncation algorithms: it allows 
the (n + l)-th site to see a T" which includes the physics of all n sites already 
included in T", even those that are far away from n+1. Therefore, the summation 
algorithm's final result for the scattering matrix includes the physics of scattering 
diagrams which start at a site v, go very far away it, and then make their way back 
to some other site w which is close to the original site v. In contrast, the Green's 
functions produced by basis truncation algorithms do not contain any information 
about scattering diagrams which travel outside of the truncation volume. 
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Unfortunately, there is a fly in the ointment, caused by an unfortunate interaction 
between the summation formula and the generalized multiplication. The summation 
algorithm is not self-consistent: changing the order in which individual sites are 
added to the total will change the final result. Therefore this summation algorithm 
is useful mainly for quickly computing an initial trial solution for the integral equation 
of the next algorithm, which is self-consistent. 

I have not seen either formula FTBl or this algorithm presented anywhere in the 
literature. 

7.3.2.2 Probenius Summation 

I just presented a summation algorithm for calculating the scattering matrix T, and 
then invoked eauation l7.4l to calculate the Green's function G. It is also possible to 
create summation algorithms which calculate the Green's function directly without 
obtaining the scattering matrix at an intermediate step. However, one pays for this 
convenience by not treating E — Hq exactly. 

Before presenting the equation for summing Green's functions, I need to intro- 
duced some notation: G" is the Green's function of n sites taken together. G""*"^ is 
the Green's function of all n sites plus another another site x, all taken together. I 
also define a site-wise Green's function = {Pv{E — Hq — V)Py)~^ , where is 
the projection operator which projects out only the states at site v. In other words, 
Gw is the Green's function of the single site w with all the other sites eliminated 
from the problem. It is easily computed by simply inverting matrices at that single 
site V. 

If V is local, the Green's function addition formula is: 

= G" + (1 + G"i/o)P,(l - G^HoG^'Hor'PAl + HoG") 

(7.7) 

Again generalizations to adding several sites at a time and to non-local potentials 
are straightforward. 

Eauation l7.7l is essentially just a rewrite of equation 4.4 from Borm, Grasedyck, 
and Hackbusch work on hierarchical matrices [88]. They call this the Frobenius 
formula, although a search on google reveals that the Frobenius formula usually 
refers to something from combinatorics. At any rate, this formula is obvious enough 
that I would not be surprised if it were rather old. 

Just as with the summation eauation l7.6l for scattering matrices, achieving 0{N) 
times will require using generalized multiplication. And again like scattering matrix 
summation, this formula is not self-consistent; the final result depends on the order 
in which sites are added unless one does not use generalized multiplication. The 
big difference between the two algorithms that in this one Go is missing, which 
reflects the fact that we are not treating E — Hq exactly. Formula 17.71 is a worse 
approximation than formula FTBl for scattering matrices. 

7.3.2.3 Sherman-Morrison- Woodbury Summation 

The numerical linear algebra community seems to favor a different formula for 
adding inverses: the Sherman-Morrison-Woodbury formula. I presume, without any 
evidence, that the reason has to do with improved numerical stability, at least in its 
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typical usage scenario. The Sherman-Morrison-Woodbury formula is typically used 
for figuring out how a solution x of the linear equation Ax = b changes when A is 
altered; or, in other words, how x ~ A^^b changes. The publication on its stability 
which is generally referenced was written by Yip [89]. 

Before giving the formula I need to introduce a little bit of new formalism. I 
already defined the projection operator of a single site v. Now I define the 
projection operator which projects out all the basis states of the first n sites P„ = 
J2ifr„Pv Obviously P„+i — Pn + Px- In this notation, the Green's function of n 
sites can be rewritten as: 

= {Pn{E - H)Pnr' (7.8) 

With this formalism, it is clear that the only difference between G" and G"+^ 
is that I have changed from using P„ to using Pn+i in eauation l7.8l This change is 
equivalent to starting with the n-site matrix Pn(E — H)Pn and then subtracting an- 
other matrix Y defined by -Y = Px{E - H)P.„ + P,,{E - H)Px + P^iE - H)Px. 
The first two terms in Y connect site x to the other n sites, and simplify to 
PxHoPn + PuHqPx. The last term in Y is the inverse of the single site Green's 
function Gy which I defined while discussing the Frobenius formula. 

The Frobenius formula separates the three terms in Y , which enter separately 
into the summation equation 17.71 the first two terms in Y enter as instances of 
Hq, while the third term enters as the single site Green's function Gx- In contrast, 
the Sherman-Woodbury formula factors Y into two factors Y = BF, where the 
two factors B and F are required to have a special structure. Specifically, they 
connect site x with the other n sites: B = PnBPx and F = PxFPn- As long as 
Px{E — H)Px doesn't have any zero eigenvalues, it is always possible to factor Y 
into two factors B and F with these properties. The actual numerical calculation 
does not require much time as long as Hq is close to diagonal. 

With this preparation, I am finally ready to write the Sherman-Morrison-Woodbury 
formula for adding a single site x to the Green's function of n sites. If V is local, 
the Green's function matrix addition formula is: 

= G" + (1 + G"P)(1-FG"P)"^(1 + PG") 

(7.9) 

Again generalizations to adding several sites at a time and to non-local potentials 
are straightforward. 

Just as with the other summation formulae, achieving 0{K) performance re- 
quires using generalized multiplication. And again like the other formulae, this 
formula is not self-consistent; the final result depends on the order in which sites 
are added unless one does not use generalized multiplication. Note that equation 
17.91 contains no instances of the free propagator Go, which signals that it is less 
accurate than equation 17.61 

7.3.2.4 Sherman-Morrison-Woodbury Analogue for Scattering Ma- 
trices 

There is an easy analogue of the Sherman-Morrison-Woodbury formula for T matrix 
addition; one just factors PnG^TxG^Pn into two factors B and F. Equation 17.61 
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then becomes: 



T 



in+l 



= T" + (1 + r"Go)P,(l - T.,BF)-^T^{1 + GoT") 



(7.10) 



This formula is perhaps the best of both worlds, combining the additional exactitude 
of the free propagator Go and the numerical stability of factoring. However, I'm 
not sure if the factoring really makes much difference in this case. 

This completes my presentation of summation formulae for evaluating the Green's 
function. 

7.3.3 Integral Equations 

In this section I will discuss four different integral equations which can be used 
to calculate the Green's function. These integral equations are all simply linear 
equations of the type: 



However, the equations in this section are properly called integral equations because 
the unknown vector a; is a matrix, i.e. either the Green's function or the scattering 
matrix, b is also a matrix, and A is a tensor with four indices. 

The simplest integral equation is the definition of the Green's function: 
{E — H)G = 1. This equation corresponds perfectly with the linear eauation l7.11l 
the input vector b is the identity matrix, A \s E — H, and the unknown x is the 
Green's function G. Evaluating the Green's function G is just a matter of solving 
this linear equation. 

Many iterative algorithms have been invented which can solve most systems of 
linear equations with N unknowns in 0{N) or O(A^lnA^) time. In our case, G is 
an X matrix and has iV^ unknowns, so these algorithms would take at least 
0{N'^) time. Therefore 0{N) scaling can be obtained only by making a further 
approximation: one evaluates the matrix product {E — H)G with generalized mul- 
tiplication. As a consequence of the generalized multiplication, all matrix elements 
of G which are far from the diagonal are ignored, and G contains only 0{N) un- 
knowns, so that one has good reason to hope that a standard algorithm for solving 
linear equations will achieve 0{N) scaling. 

Note that this algorithm is already an improvement on the summation algo- 
rithms: A perturbative analysis shows that this algorithm shares the summation 
algorithms' strength of allowing scattering diagrams to travel through the entire 
volume. However, this integral equation algorithm is unlike those algorithms in that 
it is self-consistent; there is no question of the result depending on the order in 
which sites are added, and instead all sites are treated on an equal basis. 

This first integral equation can be improved upon: one can see that it does not 
contain any instance of Go, which means that it is not taking advantage of the 
possibility of treating E — Hq exactly. Therefore I present the next two integral 
equations: 



Ax = b 



(7.11) 



(1 - GoV)G = G, 
il-VGo)T = V 



(7.12) 
(7.13) 
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Both equations can be derived very easily from the basic definitions of scattering 
theory. The unknown in eauation l7.12l is the Green's function G, while the unknown 
in eauation l7.13l is the scattering matrix T. Again generalized multiplication can be 
used to reduce the number of variables to 0{N). It is perhaps more reasonable to 
expect that the scattering matrix would die off quickly away from the diagonal than 
that the Green's function would die off quickly away from the diagonal; probably 
eauation l7.13l is better than eauation l7.12l Yet both equations share an important 
problem: the operators yl = 1 - GqV =^ and A ^ 1- VGq ^ A^ are not 
symmetric under transpositions even when the underlying hamiltonian H = is 
symmetric under transpositions. When the operator A in the linear equation 17.111 
is not symmetric under transpositions, it becomes much more difFicult to find a 
good algorithm for solving the linear equation numerically, and numerical stability 
problems can be much much worse. 

Fortunately there is a fourth integral equation which both preserves transverse 
symmetry and uses the free propagator Gq: 

T - TGoTGoT = f + TGqT 

£ 

Go = Go- V P„GoP. (7.14) 

z 

T is just the sum of all the single-site Green's functions, and Gg is just the free 
propagator with its diagonal removed. A little thought will show that the left hand 
side can be expressed as Ax with x = T and a proper choice of A. Although 
eauation l7.14l is fairly obvious once you think about it, I have not seen it elsewhere 
in the literature. It is perhaps the result that I am most proud of in this chapter. 

This is the final integral equation: it gives a self-consistent 0{N) algorithm for 
calculating the scattering matrix T and the Green's function G. 

7.3.4 Functional Minimization Algorithms 

The matrix functional Z = Tr{{E — H)F^ — F) has a single global minimum at 
F = {E- Hy\ Therefore one can evaluate the Green's function by minimizing 
this functional. Like the integral equation approach, the result is self consistent and 
contains scattering diagrams which travel throughout the system. 

This approach can easily be generalized to solve any of the integral equations 
listed in section 17.3.31 

7.3.5 Iteration 

If one knows that a matrix M is positive definite (no eigenvalues have negative or 
zero real parts), and there are no rounding errors and no generalized multiplication, 
then the following iteration algorithm will converge to Af"^. 

First, choose a matrix X which is proportional to the identity. Its proportionality 
constant should be positive and equal to or less than m, where m is the smallest 
real part of the the inverse eigenvalues of X. In some physical cases rn may be 
easily estimated using very simple heuristics, and if M is real it can be computed 
quite quickly with a Lanczos type of algorithm. There may also be an analogue of 
the Lanczos algorithm for the case when M is complex, but I'm not sure. 
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Next, apply the following iteration until convergence is achieved: 

X ->2X - MX^ (7.15) 

The convergence is quadratic, meaning that the number of digits of accuracy doubles 
at every iteration. 

The big restriction, of course, is the requirement of positive definiteness. When 
computing the Green's function Af = E ~ H; the positive definiteness constraint 
requires that the energy E be larger than the largest eigenvalue of the Hamiltonian 
H. In addition to this constraint, there are also complexities caused by rounding 
errors. 

And then there is the issue that turning this iteration into an 0{N) method 
requires that the the MX^ term be evaluated using generalized multiplication, 
which necessarily ruins the proof that this iteration converges to the right result, or 
converges at all. It would be interesting to analyze the iteration from the viewpoint 
of generalized multiplication being a tensor, and see what that says about the 
iteration's result. 

Since any even power of a real matrix is positive indefinite, this algorithm can 
also be construed as a way of computing X^^" where n is an integer and X is real 
and has no zero eigenvalues. 

This algorithm is just one of a widely known family of iteration algorithms which 
converge to various functions. The numerical analyst Marc Van Barel told me about 
one of them at a summer school. I don't remember which algorithm he told me 
about, but it might have been this one. At any rate, I ended up deriving this one 
on my own (very simple minded stuff), so I could be the first to derive it, though I 
really doubt this. 

This completes my list of 0{N) algorithms for evaluating the Green's function. 
Chapter El will discuss several other fast algorithms for the Green's function which 
based on renormalization ideas. 

7.4 Outstanding Questions 

Very little is known either theoretically or computationally about the range of validity 
of 0{N) algorithms, and almost no work has been done exploring any of their 
mathematical aspects. (In fact, this thesis probably contributes as much to these 
questions as the previously published results.) I list here some of the most important 
outstanding questions: 

• When is each algorithm valid, and what accuracy will it deliver? Answering 
this question will involve exploring issues including but not limited to the 
coherence length, the localization length, and metals vs. insulators. 

• The existing algorithms are designed only for a special class of matrices: those 
with transverse symmetry H = , and in general with H real. Perhaps 
they can be easily generalized to complex symmetric matrices H = and 
hermitian matrices H = . However there is also a need 0{N) algorithms 
for other classes of matrices, in particular the the Dirac operator which arises 
in lattice QCD. 

• Convergence: Does a given algorithm converge, and if so, does it converge to 
the correct result? How fast is the convergence? Is there a unique solution 
to the algorithm; if not, how many are there? 
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• The relation of generalized multiplication to normal multiplication. 

• Stability and accuracy under rounding errors. 

• Understanding the numerics of the Chebyshev expansion. I am unaware of 
published results on the stability and accuracy of even a scalar Chebyshev 
expansion when rounding errors are present. Moreover, there is reason to 
hope that the Chebyshev expansion can be improved upon, since it gives 
uniform convergence across the entire spectrum. In real life there are more 
eigenvalues in some parts of the spectrum than others, and one can make 
statistical estimates with good accuracy of this eigenvalue density. Therefore 
it would be useful to find an expansion which converges faster in certain parts 
of the spectrum than in others. 

• Some of the algorithms are nonlinear; for instance some of the functional 
minimization algorithms. Yet they are supposed to converge to matrix func- 
tions, which can be derived using only linear algebra. Therefore we have an 
interesting challenge of exploring algorithms which are in a grey area between 
linearity and nonlinearity. 
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Algorithms for Problems 
with Multiple Length Scales 

I have often quoted the result that 0{N) algorithms depend on the number of basis 
states iV in a linear fashion; this is why they are called 0{N) algorithms. However I 
have never discussed the proportionality constant relating the computational cost to 
the basis size. In some systems the proportionality constant can be enormous and 
0{N) algorithms can be as impractical as any other other algorithm. The focus of 
this chapter will be on understanding the proportionality constant and finding ways 
to reduce it. 

If n is the number of sites within the truncation radius R, and b is the number 
of basis states per site, then most of the 0{N) algorithms listed in chapter 
require 0{lPn?'N) time. Note that the prefactor is just h^-n? . This implies that 
0{N) algorithms are extremely sensitive to the truncation radius R. For instance, 
in a three dimensional system the number of sites inside the truncation radius R 
is proportional to R^ , and therefore 0{N) algorithms scale with R as R^ . As a 
consequence, 0{N) algorithms are very slow whenever the truncation radius is much 
larger than the lattice spacing. Mind you, they are still much faster than 0{N^) 
algorithms, but both are very slow. 

In chapters |31 and I showed that in disordered systems the truncation radius 
should be set to match the coherence length. In insulators, another length charac- 
terizing the insulator may determine the truncation radius. In either case there is a 
single physical length [ which sets the truncation radius. If this length [ is the only 
important length scale in the problem, then one can safely choose a lattice spacing 
which isn't much smaller than [, and 0{N) algorithms are applicable. However, 
if there are other important length scales significantly smaller than [, then 0{N) 
algorithms are as helpless as normal 0{N^) algorithms. 

Many physical systems do exhibit several important length scales. Here are three 
examples: 

1. In physical chemistry and condensed matter, each atom's core electrons are al- 
ways very tightly bound around their owning atom, while the valence electrons 
are less tightly bound, and in metals are even delocalized. Pseudopotentials 
traditionally have been used in order to allow ommision of the core electrons 
and their corresponding length scales from calculations. A short discussion of 
pseudopotentials and their limitations can be found section IS. 5. 31 
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2. The renormalization group has taught us to expect that the characteristic 
length scale of the physics should depend on the energy of the probe. At 
some energies you will see very detailed features; at others you will see only 
coarse grained features. A system close to a phase transition exhibits scaling 
behavior, with non-trivial physics at every length scale. 



3. A third example comes from metallic alloys. Self-consistent 0{N) calcu- 
lations of alloys require supercomputers because a large truncation readius 
is required. However, the truncation radius (and computational load) may 
be greatly reduced by explicitly modifying one's equations to acknowledge 
that outside of the localization zone there is an effective medium of average 
atoms [33,34,77,90]. This indicates the presence of two length scales: a 
short length scale within which it is important to know each atom's type, and 
a longer length scale at which it is important to know each atom's presence, 
but not its type. 

All of these examples are problematic for 0{N) methods. There is one 0{N) 
algorithm which is able to treat example^! but its success comes from supplementing 
0{N) ideas with special knowledge about the longer length scale. 

Are there algorithms which scale optimally when solving linear algebra problems 
involving several or many length scales? Such algorithms would be immensely useful, 
permitting calculations of previously inaccessible systems, speeding up current 0{N) 
calculations, and allowing all-electron calculations without pseudopotentials. 

The answer is yes. In section 1 describe how to explicitly insert information 
about length scales into one's calculations by choosing a special basis and a slightly 
modified version of generalized multiplication. The result is a very fast algorithm for 
matrix multiplication, with a prefactor governed not by the longest length scale in 
the system but instead by the smallest length scale in the system. Section l^^ points 
out that many of the 0{N) algorithms of chapter Q can be adapted easily to use 
the new basis and new multiplication. Then in section IH^ I discuss another class 
of algorithms which obtain accelerations by renormalizing away the small length 
scales; i.e. throwing away most of the basis and then changing the calculation to 
compensate for the missing information. Many of these algorithms are my original 
contribution. 

While the ideas of multiple scales have been explored very thoroughly in the 
context of quantum field theory, they are much less understood in the context of 
linear algebra. In section I present an interesting continuum model which may 
provide some insight into the physics of multiple length scales and the behavior 
of multi-scale algorithms. Section l831 completes the chapter with a review of the 
previously developed approaches to solving problems with several length scales: 
multigrid algorithms, pseudopotentials, and hierarchichal matrices. The multigrid 
algorithms are best suited to solving systems of linear equations, pseudopotentials 
handle only a very narrow subset of problems, and hierarchical matrices are still 
very new and their capabilities are not well understood. I do not discuss attempts 
to implement the renormalization group numerically, because these attempts seem 
best suited to field theory calculations, not to solving linear algebra problems. 
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8.1 Multi-Scale Bases 

0{N) algorithms sort information into two categories: all information within the 
localization volume n is important, while nothing outside that volume is important. 
This approach is appropriate for systems with only one important length scale. Effi- 
cient algorithms for systems with two or more length scales need to sort information 
with more care: some information is important only a short scale, some on a longer 
scale, some on an even longer scale, etc. In this section I will outline how to do this 
sorting. 

Recall that 0{N) algorithms sort information by using a truncation volume, 
typically a sphere with a truncation radius R. They then systematically ignore all 
matrix elements {x\M\y) such that |a; — j/| > R. 

It seems to me that the most natural generalization to several length scales is 
to assign different truncation radii to different basis states, thus signalling which 
basis states are important at which length scales. One would have to sort the basis 
into groups of basis states which share the same truncation radius. Mathematically 
this would correspond to first choosing a basis, and then sorting that basis by 
choosing a set of projection operators Pi. Each projection operator would have an 
associated truncation radius and thus would signal that its states are important 
only at length scales less than or equal to Rj. I will call a basis which has been 
sorted and supplemented with truncation information as sketched in this paragraph 
a multi-scale basis^. 

A multiscale basis should have the structure of a tree (in the mathematical sense 
of a tree), with the leaves on the tree being the original fine mesh of sites. I will 
denote the levels in the tree with an index I. Each level in the tree will correspond 
to a particular length scale, and will have its own truncation radius i?;. The leaf 
level I = 1 of the tree will have the smallest truncation radius, and the highest level 
in the tree I = L will have the longest truncation radius. Each node in the tree will 
be responsible for modelling the physics of a particular physical volume centered at 
a particular position, and will own basis states appropriate for this task. Children 
of any particular node will be responsible for modelling portions of the volume that 
their parent owns, and will own finer-scale basis states of their own. One can expect 
that the basis states owned by any level I of the tree will not have spatial extents 
exceeding that same level's truncation radius Ri. 

The details of constructing a tree, of choosing truncation radii for each level 
in the tree, and of assigning basis states to nodes, will all depend on the physical 
problem being solved. The field of multigrid algorithms, which I will discuss in 
section 15.5.41 has already spent a lot of time studying how to create appropriate 
trees. For instance, geometric multigrid algorithms assume that the starting mesh 
is a regular lattice, and proceed to iteratively block the fine lattice into coarser 
lattices. In d dimensions, each point in a coarse lattice corresponds to c'^ points in 
the next finer lattice; c is an implementation dependent parameter. The problem 
of assigning basis states to nodes has also been studied a bit by the multigrid 
community, particularly in efforts to apply multigrid ideas to lattice QCD. I will 
review these efforts briefly in section 15.5.4.41 I am not aware of any studies on how 

have seen a multiscale approach like this in two places in the literature: hierarchical 
matrices, and a paper Yokojima, Wang, Zhou, and Chen. I will discuss the former in section 
18.51 The latter authors used an energy based heuristic to impose a multi-scale cutoff on the 
density matrix [91]. They gave higher energy states larger localization volumes. There are 
also wavelet bases, but these are oriented toward models on the continuum, not to lattices. 
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to choose appropriate truncation radii for a given physical problem. 

I assume that there is some original basis in which the problem is originally 
formulated, and that all the input data are given in terms of the original basis. 
Therefore one must have algorithms for tranforming from the original basis to one's 
multiscale basis, and vice versa. I now sketch the outlines of the algorithm for 
transforming from the original basis to the multiscale basis. The algorithm for 
transforming back to the original basis is very similar and I leave it to the reader. 

I assume that the original basis is local, meaning that each basis state occupies 
one and only one lattice site. I specify individual nodes with the indices / and n, 
where I specifies the level and n specifies the n-th node in level The transformation 
from the original basis to the multiscale basis will start at the lowest level, the leaves. 
Each leaf corresponds to a site in the lattice. At each leaf I = l,n in the lattice 
a unitary transformation Uin will be applied to the leaf's basis states, and then 
the states will be separated into two classifications with a projection operator P;„. 
The states corresponding to (1 — Pin)Uin will be assigned to the leaf node ? = 1, n, 
while the states corresponding to PinUin will be assigned to its parent at level I = 2. 

At this point the transformation will have finished with the leaf level ? = 1 of the 
tree. Some basis states will have been assigned to the leaves, and some will have 
been moved to the next higher level Z = 2 in the tree. Thus the transformation will 
have separated the shortest-scale physics (corresponding to the leaf nodes) from 
the longer scales. It will now repeat the separation at level I = 2, doing unitary 
tranformations at each I = 2 node and then assigning some states to the / = 2 
nodes and moving the rest up to Z = 3. This process will continue iteratively until 
the top of the tree has been reached and all states have been assigned to appropriate 
nodes. 

Let me estimate the time requirements of basic arithmetic calculations with a 
multiscale basis. It is impossible to make such estimates without knowing the tree 
layout and the number of basis states per node; I will concentrate on multiscale 
bases characterized by self-similarity between the length scales: 

• There are a total of N basis states. 

• Each node (except the leaf nodes) has a children. 

• Each child node owns a number of basis states equal to /3 times the number 
of basis states owned by its parent. 

• I require that most of the basis states be at the bottom of the tree. Mathe- 
matically this means requiring that af3 > 1. 

• The truncation radii Ri vary in a self-similar fashion so that the number of 
level I nodes inside the truncation radius Ri is a constant, the same for every 

level. 

The time required to tranform an unlocalized vector from the original basis to this 
self-similar basis is of order 0{N). However, if the original vector was localized in 
the original basis, and had a support of only m basis states in the original basis, then 
the total time required to do the transformation is of order 0{m + In — Inm). 

I now analyze the the problem of tranforming a matrix from the original basis 
to the self-similar basis. It is best to do this by first applying the transformation 
on one side of the matrix and then on the other side, i.e. H — * {UH)U^ . The 
matrix multiplication UH can be understood as simply a process of breaking H 
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into columns and then transforming each column separately. Similarly, the matrix 
multiplication {UH)W is best understood as a transformation of each row in UH. 
The total time required to transform H depends very much on whether or not it 
has non-zero matrix elements which are far from the diagonal; i.e. whether it is 
close to local. If H is extended and all its matrix elements are non-zero, then 
the transformation will require 0{N'^) time. If, on the other hand, all matrix 
elements {x\H\y) of H that are far the diagonal (|x — y\ > R) are zero, then the 
computational cost is far smaller. I define b as the number of basis states per node 
at the lowest (leaf) level I = 0. \ also define no as the number of leaf nodes within 
a truncation volume with radius Rq. Then the time required to transform H into 
the self-similar multiscale basis is of order 0{nobNlnN). 

Now I turn to generalized multiplication, which was defined in equation 17.11 I 
use the indices i, j, and k to signify basis states in the multiscale basis. Xi is state 
i's position, and Ri is state i's truncation radius. Then the following formula shows 
how to calculate the multiple C = AB of two matrices A and B: 

Cij <d{\xi ~ Xj\ ~ Rij)^^AikBkjid{\x^ - Xk\ - Rik)<S){\xk - Rkj) 
Rij = max{Ri,Rj) 
Rjk = niax^Rj, Rk) 

Rik = max{Ri,Rk) (8.1) 
The computational cost of this generalized multiplication is just: 

e: = y , e(|f, ~xj\- R^j)e{\x, - xk\ - R^kM\xk - xj\ - Rk,) (8.2) 

I have calculated this cost for the self-similar basis which I just discussed. Remember 
that I defined 6 as the number of basis states per leaf node, and no as the number 
of leaf nodes within a truncation volume with radius i?o- Here are my results: 

• If /? = 1, meaning that every node at every level in the tree owns the same 
number of basis states, then the cost is 0{b'^ri^Nhi' N) . 

• If /3 > 1, meaning that nodes toward the bottom of the tree have more basis 
states per node, then generalized multiplication scales as 0(5^noiV). 

• If /5 < 1, meaning that nodes at the top of the tree have more basis states 
than nodes at the bottom of the tree, then generalized multiplication scales 
much worse, as 0{b'^nlN p-'^^''^). 

Therefore multiscale bases with /3 > 1, i.e. with more basis states per node at the 
bottom of the tree, are preferable. 

This is the final and most notable result of this section. Its importance lies in 
the prefactor h^ri^, which means that the prefactor is determined by the smallest 
length scale in the problem, the tree level truncation radius i?o- This contrasts 
with 0{N) algorithms, which have a single truncation radius that must be set to 
be larger than the biggest length scale in the problem. Their prefactor is b^n^, 
where n is the volume contained within the single truncation radius. The result is 
that single-scale multiplication is a factor of n^/np slower than multiplication with 
a multiscale basis. If the largest length scale is just a factor of ten more than the 
smallest length scale, the performance difference in a three-dimensional system is a 
factor of a million. 
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8.2 Extending the Reach of 0{N) Algorithms to 
Multiple Scales 



In the previous section I showed how to create a multi-scale basis and use it to do a 
very fast generalized multiplication even in systems with with multiple scales. This 
means that any 0{N) algorithm which is based on generalized multiplication can 
converted - with the aid of a multiscale basis - to an 0{Nln^N) algorithm suited to 
multiple length scales. The actual algorithms don't change; only the multiplication 
and the basis change. 

8.3 Renormalization Algorithms 

in this section I give algorithms for numerically renormalizing two linear algebra 
problems: the problem of evaluating the Green's function, and the eigenvalue prob- 
lem. By "renormalizing" I mean getting rid of most of the basis, leaving only basis 
elements which correspond to the physics of long length scales. I first show how to 
renormalize away a single length scale. Once one knows how to do this, renormal- 
izing away several length scales becomes trivial, and I will discuss this briefly at the 
end. 

I will divide the basis into two parts: the part I will renormalize away, and the 
part that I will keep. I mathematically express this division by introducing two 
projection operators: P_ projects out the states that I will renormalize away, and 
P-f projects out the states that I will keep. This completely exhausts the total basis 
states; P+ + P_ = 1. 

8.3.1 Two Scale Renormalization of the Green's Function 

The problem is to calculate the Green's function G = {E — H)~^ . This inversion 
typically runs in 0(iV'^) time. In chapterQl artificially truncated G and was thus 
able to evaluate it in 0{N) time. Here I want to take a different path, and instead 
get rid of most of my basis, so that the size Tr{P+) of the remaining basis will 
be much smaller. If this size is a fraction a of the original basis size A^, i.e. if 
Tr{P+) = aN , then I will be able to do the inversion required for the Green's 
function in 0{a^N^) time; which is much faster than 0{N^). The price of this 
acceleration is that I will have to do some extra computations to compensate for 
throwing out basis states. These compensatory calculations must be approximated 
in order to do them in a reasonable amount of time, and I will use the 0{N) 
algorithms of chapter0for the approximation. 
I start by defining two new Green's functions: 



This + Green's function would be the total Green's function if there were 
no short distance degrees of freedom, if the P+ states were all there was. It 
describes the long distance physics of the problem. 



G+{E)^{P+{E-H)P+) 



-1 



(8.3) 



G-{E) = {P_{E~H)P_y^ 



(8.4) 
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This — Green's function describes exclusively the short distance physics of the 
problem. It would be the total Green's function if the long distance degrees 
of freedom were all there was. 

If the Hamiltonian H had no matrix elements between the short-distance states P_ 
and the long-distance states P+, then the two kinds of states would totally decouple 
and the total Green's function would be just the sum of the two Green's functions; 
G{E) = G^{E) + G^{E). But in general there is no such decoupling. 

Our problem, then, is to compute the sum G{E) of the two Green's functions 
and G^ when there is a coupling between them. This is just the same summation 
problem that I discussed in chapter0 and I gave four alternative formulas for doing 
the summation. Here I choose the Frobenius formula of eauation l7.7l in the current 
notation it reads: 

G{E) = G+ e G_ 

= G+ + (1 + G+Ho){P- - P-G-HoG+HoP_y\l + HoG+) 

(8.5) 

I have assumed that the potential V does not couple the states with the P+ 
states; otherwise one would substitute H for Hq everywhere in equation 18.51 The 
inverse operation should be understood as happening only in the P_ basis. 

If we are interested only in the long-wavelength matrix elements of the Green's 
function, then equation 18. 51 simplifies: 

P+ G{E) P+=G+ + G+Ho{P- - P-G- HoG+ HoP-y^ HoG+ (8.6) 

It is interesting to compare equation 18.61 with equation 17.41 which relates the 
Green's function G to the scattering matrix T: 

G = Go + GqTGq 

Clearly, from the viewpoint of the + basis, the effect of the — basis is simply to intro- 
duce a scattering term with scattering matrix 
equal to T = H„{P_ - P.G' HaG+ HoP-y^ Hq. 

Remember that we have already conceded the necessity of computing G+, which 
requires 0{a^N^) time. Therefore the only remaining task is to make sure that the 
computations in the P_ basis requires less than 0{a^N'^) time. These P_ basis 
calculations are: 

• Evaluating G- = (P_(.B-i/)P_)"\ 

• Evaluating (P_ -P_G-i/oG+i7oP-)"^ 

Unfortunately doing these inversions and multiplications exactly would require 
0((1 — a)^N^) time, which is too much. Therefore we need to find an appropriate 
approximation. Fortunately chapter on 0{N) methods developed a plethora of 
algorithms for computing inverses in 0{N) time! Therefore equation 18.61 gives a 
way of computing the renormalized Green's function in 0{a^N^) time, where a 
is the fraction of the original basis states which we want to retain after the renor- 
malization. For example, if we have renormalized away 80% of the states, then we 
obtain an 0(125) acceleration. Not bad. 
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We can actually obtain a further acceleration by using 0{N) algorithms to do 
all the multiplications in the + basis and to do the inversion necessary to evaluate 
G+. As long the truncation radius used in the — basis is significantly smaller than 
the trucation radius used in the + basis, the renormalized 0{N) calculation will be 
far faster than an unrenormalized 0{N) calculation. We will have turbocharged our 
0{N) algorithm by deliberately including the fact that there are two length scales 
different than the system volume: one length scale determines the P_ localization 
radius, while the other length scale determines the P+ localization radius. 

Note that by using an 0{N) algorithm to evaluate the correction to the Green's 
function, we have required that this correction be nearly local. The same sort 
of assumption is imposed during renormalization of a quantum field theory: one 
requires that the renormalized theory be local. In quantum theory just as here, 
locality is an assumption that one imposes on the final result in order to be able to 
do the renormalization calculation; it is not something that one can prove from the 
original equations. 

While the Frobenius formula is already known, I believe that I am the first to 
suggest accelerating computation of the Green's function by combining the Frobe- 
nius formula with an 0{N) algorithm. In their work on hierarchical matrices, Borm, 
Grasedyck, and Hackbusch [88] do suggest accelerating calculations of the inverse 
by using the Frobenius formula, but they make special assumptions so that com- 
puting is trivial. (I believe that they require that P^{E — H)P^ be diagonal.) 
This is completely different from my suggestion of using an 0{N) algorithm to 
compute the inverse. 

8.3.2 Two Scale Renormalization of the Eigenvalue Prob- 
lem 

I now use a renormalization algorithm to solve the eigenvalue equation iJlV'n) = 
En\4'n)- This renormalization will add a term to the Hamiltonian H . This approach 
of " renormalizing" the eigenvalue problem is called Kron's problem [92,93]. My 
contribution consists in suggesting that one can speed up eigenvalue computations 
by combining Kron's problem with an 0{N) algorithm. 

I will derive the Kron's problem equations rather than just writing them down. 
I insert the identity P+ + P_ = 1 into the eigenvalue equation: 

(P+ + P-)H{P+ + P-)\^je) = E{P+ + P-)\^je). (8.7) 

I now expand the terms and simultaneously introduce a simplifying notation 
where, for instance, — P-|V')- Since P+P- — 0, I obtain two equations: 

(P-i/__)|^_,£) = i/-+|V^+,B) (8.8) 
H+^\^^,e) + H++\,P+,e) = E\iI;+^e). (8.9) 

Eauation l8.8l suggests that I could calculate IV'-.b) from \iIj+,e) if I could invert 

the operator {P^EP^ — H ). Clearly this operator is non-invertible in the full 

basis P+ + P_. However, because P_ is a projection operator, I can define an 
inverse taken in the — basis; in fact this inverse is just the Green's function G^{E) 
that I defined in equation 18.41 
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I now use equations 18.41 l8^ and 18.91 to obtain the renormalized eigenvalue 
equation, called Kron's problem: 

H++ + H+^G-{E)H^+\^+^e) = i^lV'+.s), (8.10) 

Once one has computed an eigenvalue E and the corresponding truncated eigen- 
vector !'(/'+, b), the full eigenfunction \ijjE) can be recovered with the following equa- 
tion: 

I^b) = (P+ + G-{E)H^+)\i;+^E). (8.11) 

Any eigenstates with support only in the — basis (P+IV-'b) = 0) have disappeared 
from eauations l8.lQl and l8.lll They can be determined by finding states that solve 
the following two equations simultaneously: 

H—\i'-,E) = E\i^_^E), 

H+^\i,^,E) = 0. (8.12) 

This completes the Kron's problem equations. Note that in the literature Kron's 
problem includes the generalization to non-orthogonal bases: i^lVAi) = EnO\tpn) 
[92]. 

I will call the new term in eauation l8.10l the renormalization hamiltonian. Note 
that it depends on the energy E; Kron's problem is not an eigenvalue problem. 
The physical meaning of the renormalization hamiltonian can be elucidated if we 
interpret the eigenvalue problem as a scattering problem. describes scattering 

within the + basis, while the additional term describes scattering processes that 
start in the + basis, scatter into the — basis, and later scatter back into the + 
basis. Of course the Kron's problem equations are non-perturbative and not limited 
to scattering physics. 

If we use an 0{N) method to compute the Green's function G^{E), then 
a single computation of the renormalization hamiltonian requires only 0{a^N^) 
time. The problem is that we have to search for eigenvalues E, and because the 
renormalization hamiltonian depends on E, we have to re-compute it for every trial 
value of i?. In particular, if we want to compute all of the + basis eigenvectors, and 
we evaluate the renormalization Hamiltonian once per eigenvector, then we will do 
at least 0{a^N^) computations, which is actually worse than the unrenormalized 
0{N^) time if a^N^ > 1. Therefore we must either try to minimize the number of 
evaluations, or else be content to calculate only a few eigenvalues and eigenvectors. 

One way to minimize the number of evaluations of the renormalization Hamil- 
tonian is to calculate its derivatives and then extrapolate its behavior using a Taylor 
series. Such an extrapolation will fail if E is close to one of G~{E)'s poles. There- 
fore good performance can be obtained only if almost all of the poles of G^(E) lie 
far from the eigenvalues of the -|- basis eigenvectors. This condition is equivalent 
to requiring that few eigenstates of the original eigenvalue problem be highly mixed 
between the P+ and P_ bases. Good performance is not possible without a good 
choice of basis. 

8.3.3 Renormalizing Away Multiple Scales 

In the previous two sections I showed how to renormalize away a single scale. I did 
this by dividing the basis up into two parts P+ and P_ and using the the small 
truncation radius of the — basis states. This approach can be trivially extended to 
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doing several successive renormalizations of several different length scales. Consider 
as a given a multiscale basis of the type discussed in section |0] Each level I in 
the tree has a corresponding projection operator Pi and truncation radius i?;. One 
starts by renormalizing away the lowest level, the leaves. Then one renormalizes the 
next level up, and so on. As long as the truncation radii increase substantially at 
each step up the tree, the cost of renormalization will be insignificant compared to 
that of the calculations in the final basis. 



8.4 Linear - Nonlinear 

In section IS. 3. 21 1 described an algorithm for solving the eigenvalue problem. If the 
truncation radius were made infinitely large, one would obtain the same solutions 
that would be obtained by solving the original eigenvalue problem. These solutions 
obey all the standard results from linear algebra; for instance the eigenvectors are 
guaranteed to be orthogonal. If, on the other hand, the truncation radius is finite, 
then the solutions obtained are no longer guaranteed to obey any of the standard 
theorems. One is no longer solving the eigenvalue problem, but instead solving 
something that is "like" the eigenvalue problem. The new problem is nonlinear: it 
explicitly contains a new term which depends on the energy Ems non-trivial way. 
There are considerable conceptual difficulties here, in understanding in what sense 
a non-linear problem can be "like" a linear problem. 

The same difficulty appears when trying to understand the algorithm for renor- 
malizing the Green's function. The original definition of the Green's function as 
G{E) = [E — H) ^ guarantees the existance of poles in the Green's function at 
the eigenvalues of H . In contrast, the existance of these poles in the renormalized 
Green's function is not guaranteed if one uses an approximate 0{N) algorithm to 
evaluate the renormalization term. The poles in the exact Green's function play a 
very important part in any physical interpretation which is given to it; one is left 
wondering how to interpret the approximate renormalized Green's function. 

Perhaps the possible lack of poles in the Green's function was to be expected. 
The imposition of the truncation radius R could have a severe effect on any long- 
distance physics. In particular, any eigenstates of which are extended throughout 
the system could be destroyed by the truncation, which implies that their poles would 
be deleted from the Green's function. One can conclude that any algorithm which 
purposefully neglects certain long distance physics must be able to delete poles from 
the Green's function. Indeed, conventional 0{N) algorithms not based on renor- 
malization also do this sort of eigenfunction deletion. For instance, basis truncation 
algorithms explicitly limit the eigenfunctions to stay inside of the truncation radius, 
and thus delete any extended eigenfunctions. 

I am reminded of quantum field theory, where a pole in the Green's function 
implies the existence of a physical asymptotic state. Therefore, if the QCD color 
field is truly confined, then the gluon propagator must not have any poles. In fact 
debate continues about the structure of the gluon propagator [94]. It is confusing 
to understand what a propagator is if it does not propagate anything. Perhaps an 
investigation of the behavior of the renormalized algorithms presented here, and in 
particular of what they do to poles, could shed some light on these confusing issues 
in QCD confinement. 
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8.4.0.1 A Generalized Schrodinger Equation 

Often using a continuum is simpler than using a discrete lattice. Therefore I pro- 
pose a continuum model which mimics the peculiar behavior of the renormalization 
algorithms. My model is just a generalization of the Schrodinger equation. 

The Schrodinger equation can be written as a combination of the free propagator 
and a potential: 

{Go\E) + V)\i;E) - 0, Go{E) ^{E- A^y^ (8.13) 

The free Green's function Go is not localized, and contains a pole for each 
momentum state, or in fact a cut along the positive real energy axis. However I can 
modify the Schrodinger equation, giving the momentum states a range by smearing 
the Green's function. In fact, with a proper choice of smearing I can give higher 
momentum states a longer range than the low momentum states. Here is one way 
to define the smeared Green's function: 



{S\Go\S) = 




'<^-^^f{E,\kl\x~^\){x\Go\k) (8.14) 



/ is a function describing the localization cutoff If / = 1, then the Green's function 
is not smeared and Schrodinger's equation is left unmodified. Non-constant /'s 
smear the free Green's function. For example, if I choose a Gaussian smearing 
function / = e " ^-i , then a state with momentum k can only propagate a 
distance of about — 

a\k\ 

With the smeared Greens function Go in hand, I can write a generalized Schrodinger 
equation: 

(Go"\i?) + F)|V^> =0 (8.15) 

The generalized Schrodinger equation mimics many of the problematic aspects 
of renormalization algorithms. It favors states which are not spatially extended. The 
poles in the Green's function have been smeared around and may no longer exist, 
in which case the generalized Schrodinger equation could have a different number 
of solutions than the original Schrodinger equation. The generalized Schrodinger 
equation can also be made explicitly non-linear in E (just like the renormalized 
hamiltonian obtained when renormalizing the eigenvalue equation) by making the 
cutoff function / depend explicitly on E. 

The generalized Schrodinger equation allows examination of interesting test 
cases. For instance, consider a potential V with two wells, and with solutions that 
tunnel between the two wells. Will the generalized Schrodinger equation preserve 
the tunneling solutions, or will it find two distinct solutions for the two wells, or will 
it do something else entirely? This sort of calculation could and should be pursued, 
as a way of developing a physical intuition about the effect of imposed distance 
cutoffs on linear algebra equations. 

I have never seen eauations l8.14l and 18.151 in the literature, and I believe that 
they are my own original contribution. 

8.5 Review of Multi-Scale algorithms 

Here I review the existing numerical algorithms for solving linear problems with mul- 
tiple length scales: hierarchical matrices, domain decomposition pseudopotentials, 
and multigrid algorithms. 



138 



Chapter 8. Algorithms for Problems with Multiple Length Scales 



8.5.1 Hierarchical Matrices 

Recently Hackbusch introduced a combination of mathematical theory and com- 
putational algorithms which he calls Ti. matrices, or hierarchical matrices [88,95]. 
I have only skimmed through some of his articles, but my impression is that he 
introduces a basis which is similar to the multiscale bases I have discussed here. He 
then truncates matrices in a way that is closely tied to his basis. Hackbusch uses 
a lot of mathematical notation to describe his basis and truncation scheme, and I 
have not yet understood these. However, my impression is that his end result after 
truncating a matrix is similar to what I would obtain if I, while working inside of 
a multiscale basis, truncated all the matrix elements that were farther apart than 
the level-dependent truncation radius. I suspect therefore that there is some over- 
lap between Hackbusch's ideas and mine. Where there is overlap, he would have 
precedence because he started publishing articles on his ideas by 1999. 

In any case I am also quite sure that there are also a lot of differences between 
our approaches. I examined Hackbusch's algorithm for matrix inversion and it seems 
very simplistic. And he does not discuss or apply a lot of physics ideas of localization, 
0{N) algorithms, et cetera. 

8.5.2 Domain Decomposition 

Domain decomposition algorithms are a particular approach to solving linear equa- 
tions, and based on the idea of breaking systems into parts. As such, they can be 
understood as two-scale algorithms: there is the length scale of the individual parts, 
and the length scale of the whole system. Domain decomposition algorithms have 
been in use for a long time and are still being actively developed by a large research 
community. 

I focus here on a specific approach to domain decomposition based on Schur's 
complement. First one separates the degrees of freedom into a coarse subset and 
a finer subset; this can be represented with the projection operators P + Q = 1. 
Then one notes that solving the linear system Ax = 6 is equivalent to solving 
a modified equation in sub-basis P: APx = b, where the Schur's complement 
A is defined by A= PAP - PAQA~^QAP, and the input vector b is defined by 
b = Pb — PAQA^^Qb [92]. One then solves the modified equation using a suitable 
iterative algorith. The Schur's complement itself never has to be calculated explic- 
itly; instead at each iteration a linear equation is solved to obtain PAQA^^QAPxi. 

The removed sites described by the operator P are always chosen to form sur- 
faces dividing the remaining sites Q into independent domains. The net result 
is that the system is divided into n different linear systems which can be solved 
separately. 

8.5.2.1 Domain Decomposition for the Eigenvalue Problem 

Some research has been done on applying domain decomposition ideas to eigenvalue 
problems. Many algorithms for solving the eigenvalue problem involve solving linear 
equations; one can try to use domain decomposition to speed up the the solution of 
these linear equations [92]. Alternatively, one can use the Kron's problem approach 
which I explained in section l831 Lastly, one can solve sub-domains separately and 
then combine them using a Raleigh-Ritz variational principle [92,96]. 
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8.5.2.2 Domain Decomposition for Lattice QCD 

Only recently was an article published that suggested that a Schur's complement 
algorithm can be applied to lattice gauge theory [97]. The author suggested that 
the inverse QA~^Q could be estimated cheaply by first expanding it as a Taylor 
series in A and then estimating the Taylor series coefficients stochastically. 

More recently Luscher suggested a domain decomposition algorithm which seems 
to be more promising, though it still needs to be fleshed out [98,99]. 

8.5.3 Pseudopotentials 

Molecular and condensed matter systems exhibit two or more length scales: one for 
the valence electrons (which may or may not be localized), and then much smaller 
length scales, one for each shell of core electrons. However, there is something 
very special about this problem compared to the multi-scale problem addressed in 
this chapter: the core electrons occupy volumes quite a bit smaller than the atomic 
volume, so they do not interact significantly with other atoms. Therefore, when 
calculating behavior at a given site, you can ignore all core electrons except those 
at your own site. 

The extreme locality of the core electrons allows tremendous simplification; many 
years ago the condensed matter community performed that simplification by devel- 
oping norm-conserving pseudopotentials [100,101]. In pseudopotential calculations, 
the core electrons are totally eliminated from the calculation, and the potential felt 
by the valence electrons is modified to make up for the lack of the core electrons. 
This pseudopotential approach is very similar to Wilson's approach to renormal- 
ization, where an effective interaction is introduced by integrating out high energy 
degrees of freedom. There is, however, a large difference: pseudopotentials can 
be calculated from first principles, while effective Lagrangians are fit to data. By 
now pseudopotentials are now universally accepted, pretty stable, and give very high 
accuracy results for most calculations. In fact, without pseudopotentials modern 
electronic structure calculations would not be possible: they are the main way of 
coping with the physics of core electrons. 

There are two difficulties which prevent one from generalizing the pseudopo- 
tential approach to other multiscale problems. First, pseudopotentials are unable 
to model interactions between one atom's core electrons and another atom's core 
electrons, and therefore require that the degrees of freedom to be renormalized away 
be spatially separated from each other and not interact with each other. The other 
difficulty is that pseudopotentials are hard to calculate in problems without spher- 
ical symmetry, simply because of the huge computational resources that would be 
required if more than two length scales are involved. For problems without a sim- 
plifying symmetry, pseudopotential methods allow in practice only one extra length 
scale. It is our good luck that rotational symmetry is well preserved throughout the 
atomic table, and that we are therefore able to calculate atomic pseudopotentials 
for atoms with many shells and therefore many length scales. 

8.5.4 Multigrid Algorithms 

Multigrid and multilevel algorithms began with a landmark paper in 1977 [102], and 
have since taken over in many disciplines. They are competitive options basically 
anywhere that partial differential equations must be solved [93, 103, 104]. Although 
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there is a distinction between multigrid and multilevel algorithms, they all share the 
same philosophy; for simplicity I will here apply the word multigrid to them all. 

Historically, the multigrid approach has found the most success when applied to 
the problem of solving the linear system Ax — b, where x is unknown. All partial 
differential equations can be reduced to linear systems by discretizing space, and 
then linearizing any non-linearities. (One then adds the non-linearities back in via an 
iterative process.) Because multigrid work is focused on solving linear systems, I will 
first focus on linear systems and only later discuss the eigenvalue problem. Linear 
systems with large numbers of variables are usually solved using Krylov subspace 
algorithms; i.e algorithms where the i-th approximation is computed by applying a 
polynomial in A of order i to the residual b — Axq [105]. Krylov subspace algo- 
rithms are vulnerable to very long convergence times when the matrix A in Ax = 6 is 
small. The matrices A generated by discretizing a partial differential equation have 
smallest eigenvalues which are exponentially small as the lattice spacing decreases; 
Krylov subspace algorithms show a corresponding exponential growth in conver- 
gence time. This is problem called critical slowing down. Of course the smallest 
eigenvalue may always be increased by an appropriate basis transformation (this is 
called preconditioning), but finding a good transformation can be as challenging as 
the original problem [105-108]. Thus critical slowing down is a key motivator for the 
development of alternatives to Krylov subspace algorithms, in particular multigrid 
algorithms. 

Multigrid algorithms begin by separating out the length scales in the problem. 
Having established the various length scales, these algorithms obtain a separate 
solution of the linear system at each scale, and then combine the various scales 
together to obtain the final solution. The implementation of this strategy differs a 
lot from algorithm to algorithm, and I will here review some of the most significant 
approaches. 

8.5.4.1 Geometric Multigrid 

"Geometric multigrid" algorithms were developed first and have seen the most 
widespread acceptance [103]. They assume that the degrees of freedom are dis- 
tributed in a regular lattice, which is called the fine grid. Geometric multigrid 
derives the longer length scales by organizing the sites on the fine lattice into blocks 
of width w and volume . It then thinks of the lattice of blocks as a (coarser) 
lattice in its own right. This process of creating a coarser lattice obviously can be 
repeated: the result is a hierarchy of lattices, with the coarsest lattice having only 
one site, and the finest lattice being the original lattice. The degrees of freedom at 
each level in this hierarchy are represented by the vectors a;', where x" represents 
the fine grid and x^ represents the coarse grid. This process of building a hierarchy 
corresponds exactly to the procedure of building trees which I described in section 
lO 

Having created the hierarchy of lattices, one must relate them to each other. Ge- 
ometric multigrid does this with two linear operators: the "restriction" (averaging) 
operator R which maps fine grids to coarser grids via x''^^ ~ Rx\ and the "prolon- 
gation" (interpolation) operator P which maps coarse grids to finer ones [109, 110]. 
Additionally, there must be a rule for deriving a coarse grid version of the matrix A; 
a popular option (used in the Galerkin coarse grid approximation) is A^^^ = RA''P. 
The choice of R, P, and A varies from algorithm to algorithm. 

Using these relations between the various grids, geometric multigrid pursues a 
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strategy of iterative refinement of the solution on each grid. However, there 
is a very special prescription for when geometric multigrid algorithms refine which 
grid, and when they move results from one grid to another. Specifically, they start 
by refining the solution on the finest grid, and then restrict to the next finest grid 
(i.e. map a;" to x"~^ .) They then repeat the process of refining and restricting 
until they arrive at the coarsest grid. At this point they then move back out to the 
finest grid in a process of refining and prolonging [109,110]. For many problems, 
this sort of algorithm converges within a very small number of iterations, giving a 
total computational cost of order O(A^lnA^); i.e. the exponential cost associated 
with critical slowing down is totally avoided. This speedup has been both predicted 
theoretically for simple problems [109] and observed in practice for much more 
complicated problems. 

Theoretical analysis reveals that fast convergence is very tied to making an 
appropriate choice of restriction and prolungation operators R and P [109]. This is 
often talked about as an issue of "smoothness," because if the matrix A is spatially 
isotropic then the choice of R and P is essentially trivial. The root problem is that 
the restriction operator R must effectively isolate the problem's long-wavelength 
behavior from its short-wavelength behavior, and the prolungation operator P must 
in turn do a good job of interpolating the long-wavelength behavior back to the finer 
grid. Therefore the choice of R and P must be informed by a precise understanding 
of the long distance and short distance degrees of freedom and a good way of 
dividing them up. The degrees of freedom and their division are determined by 
the matrix A, so making a suitable choice of R and P can be as time-consuming 
as diagonalizing A. In practice, multigrid algorithms are most succesful when the 
matrix A's behavior is dominated by terms that are well understood, like derivatives. 

An additional problem with geometric multigrid is that it requires that the system 
be a regular lattice. This is problematic in problems with irregular boundaries 
[103, 104], and in problems like quantum chemistry where sites are free to move 
about. For such problems a more flexible blocking algorithm is required. The 
choice of blocking can be as challenging as the original problem; multigrid algorithms 
with flexible blocking typically spend more time blocking than solving the blocked 
equations [103]. Nonetheless they can often give far faster solutions than Krylov 
space algorithms. 

8.5.4.2 Multigrid for Non-Uniform Systems 

"Algebraic" multigrid algorithms are designed specifically to solve the problems of 
geometric multigrid [104]. Instead of blocking, they select out a subset of the fine- 
scale sites to be used in the coarse-scale problem. Heuristics determine this selection 
procedure, as well as the choice of prolungation and multigrid operators. While 
algebraic multigrid algorithms are still immature, they have already had significant 
success with highly anisotropic problems like the Navier-Stokes equation. Highly 
disorded problems seem more problematic. 

8.5.4.3 Multigrid for the Eigenvalue Problem 

Multigrid algorithms are easily generalized to computing lowest-energy eigenstates 
of Schrodinger's equation [93, 111]. If q eigenstates are needed and is the system 
volume, these algorithms scale as either qN or q^N, depending on the algorithm. 
I am not sure whether these scaling laws are preserved if q becomes large. Also, 
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these estimates might be reduced if the eigenstates being calculated are localized 
within volumes smaller than the system volume; several recent papers have reported 
results in this vein but I have not read them. 

8.5.4.4 Multigrid for Unquenched QCD 

The most commonly used algorithm for calculating unquenched QCD is Hybrid 
Monte Carlo, which introduces new degrees of freedom representing the fermions 
[112]. These degrees of freedom evolve according to forces reflecting the gauge 
fields, and the forces are calculated by solving a linear system Ax = b. Solving 
this system remains the dominant computational bottleneck for unquenched calcu- 
lations [113]. Hybrid Monte Carlo's biggest competitor is probably the Local Boson 
Approximation, which again faces a computational bottleneck of evaluating forces 
by solving a linear system [113, 114]. 

A number of researchers have tried to apply multigrid ideas to this challenge 
[110,115,116]. Typical of these efforts is the "Ground-state projection" multigrid 
algorithm, tailored to disordered problems [117]. Like geometric multigrid, ground- 
state projection multigrid assumes a regular lattice and normal blocking. However 
it diverges from geometric multigrid by customizing the restriction operator on a 
block by block basis: it solves the eigenvalue problem Ax = Ex on each block 
individually, and uses the ground state of each block as the restriction operator 
R. The idea is that the presence of disorder requires a custom treatment of each 
block, and that a given block's ground state best separates out the blocks's long 
distance physics from its short distance physics. This is essentially just a heuristic 
algorithm, with little theoretical justification. In fact nobody really understands 
how to separate short and long distance degrees of freedom in strongly disordered 
systems. 

In 1990-92 four lattice groups around the world tried to use this and other multi- 
grid algorithms to accelerate QCD calculations [117]. They had enough difficulties 
that research ceased by 1994 or so. 

This completes my review of already developed algorithms optimized for two or 
more length scales. 



Chapter 9 



Reliability and 
Reproducibility of 
Computational Results in 
Physics 

9.1 Introduction 

Ours is an era of specialization: professionals establish a specialty and usually stay 
within that, remaining at amateur's level in all other disciplines. Specialization 
create divisions between disciplines, so that chemists (for instance) communicate 
mostly with other chemists, and often even restrict themselves to a particular type 
of chemistry. Even when the advances of one discipline can be of substantial aid to 
another discipline, they may remain unknown, unused, ignored. 

This paper concerns the disciplines of physics and computers. Long ago com- 
puter professionals, particularly those who risked retribution from clients if their 
computations produced incorrect results, discovered certain systematic difficulties 
afflicting any computing, and began developing an expertise to manage and amelio- 
rate them. Specifically, it is very difficult to ensure the reproducibility and reliability 
of a computation, or to understand and manage the complexity of a computer. Put 
another way, computers are a bona fide, real life example of Murphy's law: they 
have a very strong natural tendency to always do the wrong thing. It is very difficult 
to provide the configuration, programming, and inputs that are necessary to obtain 
correct results. Moreover, it is always impossible to be sure whether the results 
are correct without obtaining the same results in an independent way not involving 
computers. 

In this paper I will show that although the physics community has adopted the 
computer for both experimental and theoretical research, it (with some exceptions) 
has not learned to respect the gross difficulties inherent in computing, and does not 
practice even the most basic disciplines which the computing profession has devel- 
oped to manage these difficulties. 1 will point out that this "sorcerer's apprentice" 
approach to computing can cause both daily difficulties and long term errors. I end 
by providing a list of recommendations and resources for both individual physicists 
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and institutions interested in improving the quality of physics research. 
This paper will be of personal interest if the reader has ever: 

• had difficulty remembering the exact parameters, settings, procedures, or data 
he or she has used to produce a graph or result. 

• wanted to reuse graphs or results produced by another physicist, or to pro- 
duce somewhat different results by adapting the other physicist's software and 
configuration, but found this task too difficult to be practical. 

• wondered whether a graph or numerical result could be trusted. 

• been unsure how to estimate the accuracy or certainty of a result. 

The author comes from a perspective of having pursued careers in both com- 
puting and physics. He has both undergraduate and master's degrees in physics, 
which were accompanied by research assistantships implementing physics software. 
This education was followed by a career in computing, focusing on the reliability of 
software used by large enterprizes. More recently, the author returned to physics. 
To a computing professional, certain aspects of the physics community's usage of 
computers seem very puzzling. This paper is the result of an effort to research 
the physics community's computing practices, to understand the reasoning behind 
them, and to find resources and best practices that might help. 

Because the author's expertise is limited to physics and computing, this article 
focuses specifically on the physics community. However, it is probable that most of 
the material applies equally well to the other sciences, even softer sciences [118]. 

9.2 Problems That Come With Computing 

9.2.1 Reproducibility 

It is very difficult to create software whose behavior is reproducible; i.e. that does 
the same thing every time you run it. The reason is that computers are based on 
binary arithmetic, which is as nonlinear as one can imagine. Nonlinear systems are 
extremely sensitive to initial conditions; if even one bit in the program itself or in its 
data or environment is different, then its execution can easily give a totally different 
result. 

Of course, computers are - theoretically - deterministic, so that if you run exactly 
the same program twice, with exactly the same environment, and with exactly the 
same data, it must produce exactly the same result. In this very theoretical sense 
every program is reproducible. But as a matter of objective fact, the conditions for 
deterministic behavior are never fulfilled, even when running repeatedly on the same 
computer. A program always runs with the assistance of an operating system, and 
operating systems give their hosted programs different environments at each run - 
they change the memory location in which a program is stored, they schedule cpu 
allocation differently, et cetera. Moreover, the information in a computer's disk and 
memory, as well as the environment outside of the computer, certainly will change 
from run to run. 

Dependency on the environment can be ameliorated by adopting conventions 
which insulate the program from its environment. Operating systems insulate pro- 
grams from their position in memory by rewriting them during the load into memory. 
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Standard software layers (APIs) insulate programs from the details of where data 
is stored. However these and other technologies are fundamentally limited: first, 
because there are many situations where software needs to act differently depend- 
ing on its environment, and secondly, because the insulation is not foolproof. For 
instance, no file system API can insulate programs from a disk full condition. A 
second, more subtle, difficulty, is that these insulation techniques are vulnerable to 
bugs in the operating system, the hosted program, and in the APIs themselves. I 
will come to the difficulty of bugs in section l^.2.2l on reliability. 

Leaving aside changes in the environment, reproducibility still requires complete 
control over the program itself: the program's source code, its compilation process 
and any other ingredients in production of the executable, the program's configura- 
tion, and the program's inputs. One must have complete records of these variables 
for each run, and then be able to recreate the exact same variables at will, be- 
fore one can hope to obtain reproducibility of results. Although this is a herculean 
task, professional software developers find it absolutely necessary, because (a) soft- 
ware development halts as soon as programmers stop being able to collaborate with 
each other, and (b) user satisfaction plummets when the software does not run. 
Therefore, every professional software development organization devotes a signifi- 
cant percentage of its resources to recording and managing its code, compilation 
process, configurations, and input data. 

In order to give an idea of the extremity of the reproducibility challenge, I will now 
briefly sketch some of the most basic steps that computing professionals routinely 
take to surmount it. Typically there is a computer dedicated entirely to running a 
Source Code Control program. For those not familiar with this terminology, I mean 
a program designed to store detailed information about other programs. Whenever 
even the tiniest change is made to a program's source code, compilation scripts, 
configuration files, or inputs, the result is thought of as a new version of the entire 
program. A permanent record of every single individual version of the program is 
recorded in source code control, so that one can obtain old versions on demand. 
The source code control database is, of course, backed up on a regular basis. 

In order to know whether one's software is reproducible, one must compile, 
run, and test it. Human efforts to compile, run, and test programs are inherently 
non-reproducible; therefore various scripting technologies like "make" files are used 
to automate the process. Before recording a change in source code control, one is 
required to do an automated compilation and test of the new version, and to obtain a 
supervisor's approval. However this requirement is not nearly fool proof, so typically 
the program is recompiled and re-tested every day using automation scripts which 
are also stored in source code control. The daily compilation (and tests) are done 
on computers which are dedicated to this purpose, and which are configured solely 
from source code control. Since the compilation server is clean of any changes 
not stored in source code control, one hopes that the compilation process will be 
reproducible. The compilation process is, of course, closely monitored, and much 
of the organization's staff is considered to be permanently on call to quickly resolve 
problems. 

These disciplines require a constant vigilance from every staff member, unremit- 
ting pressure from management, the existence of some staff tasked with maintaining 
and monitoring the whole process, and a constant and large expense in both time 
and resources. Yet even when implementing these practices, an average software 
development organization struggles daily to achieve reproducibility, and fairly reg- 
ularly fails to obtain it. For instance, a new version which runs successfully on a 
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developer's computer often fails to compile on the official compilation server. 

Although later in this paper I will recommend adoption of some (but not all) 
of these technologies and practices to physicists, that is not my point here. My 
goal is simply to point out that these organizations - often private enterprises under 
extreme pressure to save money and time - find reproducibility so unreachable that 
they regularly go to extremes to attempt it, and often still fail. The problem is not 
lack of expertise or experience: these organizations are composed largely of persons 
trained in computer science, with years or decades of experience. Nor is the problem 
one that a physicist, imagining herself smarter, could hope to be immune to: private 
computing firms try very hard to hire the smartest people including smart physicists, 
offer immense salaries, and often they offer careers that are more attractive than a 
physics career. There is no reason to think that physicists will have less problems 
with reproducibility than computing professionals do; there is plenty of reason to 
think that they will have more. 

So far I have discussed the difficulty of getting a single program to give repro- 
ducible results. I now briefly mention that two programs written to do the same 
thing will as a rule exhibit different behaviors. Hatton and Roberts [119] compared 
nine seismic analysis codes which convert echo data into an image of the earth's 
near surface structure. These codes had typically been in regular use for fifteen 
years or more, and on average each contained 750,000 lines of code. There is a 
large incentive to ensure that these codes produce correct results: they are used to 
make multimillion dollar choices about oil drilling. Hatton and Roberts identified 
34 processing functions (on average about 150,000 lines of code) which were shared 
by all nine codes. Fourteen of these functions were implementations of the same 
published algorithms; they could be expected to give identical results when applied 
in sequence to a data set. Instead the discrepancy grew at an average of one percent 
per three thousand lines of source code and approached a hundred percent in the 
final map of the subsurface. Geoscientists comparing the results found that "the 
differences were not subtle, corresponding to alternative equally legitimate litholog- 
ical views which can fundamentally affect the conclusions reached as to the nature 
of potential hydrocarbon accumulations." 

Hatton and Roberts write, "It is perhaps understandable for the dedicated sci- 
entist to claim that his software does not have such problems and that the seismic 
data processing community do not program well. This 'somebody else's problem' 
attitude is regrettably frequently encountered, but the authors' experience at see- 
ing much software from around the world in numerous different application areas 
combined with the large static fault study cited earlier, suggest that the seismic 
data processing development environments are if anything, more mature than the 
average, often containing a software QA function and defined test datasets." 

The bottom line is that software naturally fights reproducibility. Or, put another 
way, it is a miracle of human persistence and professionalism when reproducibility 
is obtained. 

9.2.2 Reliability 

I take the liberty of defining software as reliable if " it always does what is expected 
of it." In order to avoid the mistake of moving the focus from the end user to 
the developer, I have deliberately not added a proviso about the program design 
goals. To illustrate this point, let me use the example of a program which should, 
as an intermediate step in a larger computation, perform data smoothing. Suppose 
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that on a particular occasion it does not do the smoothing, and does not alert the 
user that smoothing was not performed. It doesn't matter whether the program 
was designed that way; for instance perhaps it was designed to react to certain 
resource constraints by silently skipping the smoothing. What does matter is that 
the program delivered misleading and wrong results to the user, and therefore is 
unreliable. 

A program's behaviour may be reproducible and yet unreliable. In the previous 
example, the software might reproducibly give incorrect output whenever resource 
constraints occur. As another example, consider a floating point calculation which, 
due to rounding issues in its implementation, produces less accurate results when one 
of the inputs is close to a certain value, but does not signal the decreased accuracy 
to its user. The user would likely be surprised to find that for this particular input 
value the result is 1 ± 100 instead of 1 ± 0.01. This is an instance of unreliability, 
since the program did not do what was expected of it. 

Invariably humans (and I mean to explicitly include the reader of this article) 
instinctively grossly underestimate the intellectual effort, professionalism, and sheer 
time that is required to obtain reliable results from software, and correspondingly 
overestimate software reliability. There is abundant evidence that this wishful think- 
ing is the human's invariable, natural, knee jerk, reaction to software reliability. 
Within the field of software development, prodigious effort has been devoted to 
studying both human wishful thinking and the real observed difficulty of obtaining 
reliability. However, because the results of these studies are always out of tune 
with our natural optimism, in general even organizations with much expertise and 
experience in software unreliability tend to still both underestimate the difficulty 
of obtaining reliability and also overestimate the reliability of their own software. 
While their errors may be significantly smaller than those produced by human in- 
stinct, nonetheless they are frequently wrong by multiplicative factors. For example, 
an organization may make a detailed and thorough estimate that a particular pro- 
gram will take a year to complete, and instead spend three years on it. Moreover, 
throughout the last year its estimate of the remaining time may remain constant at 
three months! If even the experts have these difficulties, then physicists in general, 
and the reader in particular, can be assured that software reliability is far harder to 
obtain than they imagine. 

It is often hard for humans, born optimists in this respect, to agree with the 
following assertion: One can be confident that any program, without exception, is 
unreliable if it has not passed a comprehensive suite of tests and has not been used 
by a substantial number of people. Yet every systematic study of software reliability, 
whether in small programs of a few dozen lines or large programs with millions of 
lines, confirms this statement unambiguously. The only reason why the software 
in wide spread use, whether operating systems or the programs they host, is at all 
usable, is that it has been developed and tested at enormous cost in time, expertise, 
and other resources, and then used by thousands or millions of people. Despite all 
this, it is still painfully unreliable, the subject of continued unanimous lament. The 
important question for physicists who configure and use software, or may even write 
software, is not whether any untested software or configurations they might use are 
unreliable, but instead how to deal with the guaranteed unreliability. 

I briefly list some other disturbing facts about unreliability. These are so well 
known within the computing community that they are simply truisms. I repeat 
them, however, because the reader may not be so familiar with them. 
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• Whenever software is used in a new way, a new failure mechanism will likely 
take effect. Thus, a program that is quite reliable for common tasks often is 
very unreliable for less commonly used functions. 

• When controlling or configuring software, a user will usually make mistakes 
about which settings or commands to use. For example, a graphing utility 
will produce an incorrect graph if one gives it incorrect parameters. These 
user mistakes may not be detected, and the results may be incorrect and/or 
irreproducible. Thus, one must always test and verify the results obtained 
from software, even when the software itself is reliable. 

• Practically speaking, it is impossible to create fully reliable software. Software 
reliability is a matter of degrees, and of confidence. Therefore one must learn 
to acknowledge and manage the risks of using computers. This is no excuse, 
however, for not testing the software, for in that case one can be certain that 
the results will be incorrect. 

• Most bugs will not be found unless one actively tries to prove that the software 
is not reliable [120]. Alternative approaches, for instance the approach of 
trying to prove that the software is reliable, are too friendly to the natural but 
incorrect optimism that humans bring to software. 

• Testing requires considerable intellectual effort and discipline. Design of test 
suites is at least as challenging as design of the original software [120]. 

• Once a bug is found, considerable expertise, professionalism, and time must 
be expended in order to isolate and understand it. Even then a correct solution 
can be very difficult to obtain. It is very easy to convince oneself of the validity 
of an incorrect solution. One thus fails to fix the original bug, and may even 
introduce new bugs. 

Software reliability is impossible without either sufficent testing or else widespread 
adoption by large numbers of users who are willing to struggle with completely un- 
reliable software for a prolonged interval. Reliability after wide adoption is not an 
option for most scientific calculations, which are typically special purpose calcula- 
tions that need to be correct by publication. 

In section 1^31 1 will discuss the fact that physicists usually restrict their checks of 
computational results to global, qualitative, non-automated checks. It is well known 
in the software industry that relying on this approach alone will in the end do little 
or nothing to improve software reliability. Humans are notorious for not noticing or 
discounting inconsistencies in computational results, and therefore qualitative test- 
ing is useless. Instead one must carefully decide exactly what the software should 
do when given certain inputs, and then automate a check that every detail of the 
prediction is fulfilled. Moreover, global checks alone, even if they are quantitative 
and automated, will fail to uncover more than a small percentage of the bugs. This 
is partly because many software failures occur at intermediate stages in the calcula- 
tion, partly because the actual execution of a program depends very sensitively on 
its inputs and therefore a global test probes only an exponentially small percent- 
age of the ways a program could execute, and partly because global tests tend to 
exercise the core 10% or 20% of the code and not touch many of the peripheral 
options. Therefore software reliability requires that many other varieties of tests be 
implemented [120]. One should test each input parameter by trying several different 
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input values, including values which stretch the software's limits and invalid values. 
Various combinations of input parameters should be tested. One should test indi- 
vidual functions (procedures) within the code. One should look inside the functions, 
analyze control flow and data flow, and design tests aimed at validating them. One 
should implement both systematic sets of tests and individual tests aimed at points 
in the code which one suspects are more likely to have problems. And when two 
or more people are available, they should read through each other's code together 
with a critical eye, asking the author to explain the behavior of the code line by line 
to the others. Whenever possible, tests should be created by somebody other than 
the author, since as a rule authors are even more unduly optimistic about their own 
code than others. 

Testing is also needed when the physicist is re-using another's software. These 
tests can be broken down into two types: Tests of the software and its installation, 
and tests of the user-specific aspects: various input data, configuration files, scripts, 
and any manual actions to control the software. I first discuss the software and its 
installation. Any error messages that occur during installation should be recorded, 
and any log generated by the installation should be archived. In ideal cases software 
will come accompanied by an automated test suite [52]; in this case the suite should 
be run after installation, and also at any point when one begins to wonder whether 
the software is broken. However this is not enough: One should review the test suite 
(hopefully it is well documented; otherwise one will end up reading its source code) 
and see how much it tests the functionality that will be used. One would hope that 
the test suite will not only thoroughly test each individual option or command that 
will be used, but also include global tests of scenarios that closely approximate what 
the user will be doing. One should also consider whether the software will be used 
for a task that isn't done very often by other researchers, or perhaps for two tasks 
that aren't done in combination very often. 

When this examination shows that the test suite is missing tests, the physicist 
will have to create them herself. In many situations, including most commercial 
software, no test suite is supplied. In other cases a test suite will be supplied 
but without source code or documentation. In these cases, one must consider the 
entire functionality to be at risk and either avoid using it entirely or implement a 
test suite of one's own that thoroughly tests the functionality of interest, including 
each individual option, command, or step in the computation. For instance, if 
the physicist is using a symbolic algebra program to manipulate gamma matrices, 
she should create a set of scripts that demonstrate the program's ability to do all 
the individual matrix manipulations correctly and also to do a few very involved 
manipulations. Of course such testing efforts will be greatly hindered if the source 
code of the software is not available. This argues strongly that all scientific software, 
commercial or not, should be accompanied by a thorough, well documented, open 
source test suite. 

I now turn to testing those aspects of the calculation which are unique to the 
user. This testing will include a detailed review of the configuration files and scripts 
create by the user, preferably with two or more people examining them together. 
The testing will also involve checking each configuration option and command to 
verify that it actually does what the user expects it will do, both by reading the 
option or command's documentation and by running it and checking the results. 
Data, too, should be examined thoroughly, and one must ascertain that the software 
is able to read the data correctly. For instance, when graphing data one should verify 
the graph by opening the data file, actually reading some of the data points, and 
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checking that they are represented accurately on the graph. 

The last paragraphs should not be construed as a complete description of the 
practices necessary to obtain confidence in a calculation's results. I recommend 
readings on the disciplines of software testing [120] and of verification and validation 
[121-124], and I also expect that a professional care for accuracy and thoroughness 
will naturally suggest further tests. 

9.2.3 Complexity 

Computers are extremely complex. Here I discuss two aspects of that complexity: 
the way computers get things done, and the information overload they produce. 

First, the way computers get things done. It is extremely difficult to understand 
at the lowest level (machine code, registers, paging, call stacks, etc.) the inner 
workings of even the simplest program. But let's leave that aside and move to 
a higher level of abstraction, to the level of short programs written in high level 
languages like C++ or FORTRAN. The problem is that tasks that are conceptually 
very simple can be extremely complicated on a computer. Even when the source 
code for the task is short, understanding what that code actually does and what 
results it will produce is often far beyond modern capabilities. Consider diagonalizing 
a matrix. This task is so conceptually simple that linear algebra books often present 
only one or two simple algorithms for diagonalization, and physics books generally 
skip its details altogether. Despite this conceptual simplicity, the design of good 
algorithms for diagonalization is an apparently infinitely complex issue. It has been 
under continuous research since the introduction of computers and is by now a 
very involved research field [125]. Interestingly enough, although the algorithms 
which this field invents can often be described with just a few dozen lines of text, 
the details of their behavior and accuracy often resist years of study by expert 
numerical analysts. 

Now consider scientific software running on a modern PC, which typically per- 
forms billions of arithmetical operations per second and is equipped with at least 
two billion bits of random access memory. Scientific software often takes advantage 
of much of that memory (and other resources: hard disk, DVD, Internet, and vari- 
ous peripherals) to arrive at its results. Moreover, the source code used to specify 
how to arrive at those results can easily have a length equivalent to several books. 
That is not counting the fact that modern software generally relies on the operating 
system to do a lot of things, and that the source code for an operating system has 
a length equivalent to several hundreds of books. If it is often impractical to obtain 
a detailed understanding of the behavior of a only few dozen lines of source code, 
then clearly the details of the behavior of most useful software are far beyond our 
comprehension. 

This brings me to a second aspect of complexity: information overload. We are 
drowning in information. I doubt that the reader needs to be reminded of the reams 
of data which are produced by calculations and computer-assisted experiments alike, 
the challenge of storing it all, the difficulty in understanding and recording the exact 
configurations and parameters that went into producing the data, the mind numbing 
tasks of analyzing and understanding the data, or the continueing uncertainty that 
one's analysis was entirely correct. 

Perhaps information overload is the root of all the other computational difficul- 
ties I have discussed. Probably if there were only ten or twenty things I needed to 
know about a computer then I would have little problem making it do the same 
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thing over and over again (reproducibility), with understanding whether it is working 
correctly (reliability), or with fixing it until it does work correctly (reliability again.) 
A hammer would be a good example of this - despite the fact that it is made of 
incredibly complex materials, I really just need to know its heft, its size, its swing. 
Even then, figuring out how to hit a nail straight on can be very challenging: but 
anyone will learn that, given enough months of practice. 

Judging by human success with complex machinery (a spectrum from ground 
vehicles to jumbo jets to factories to nuclear reactors), under ordinary conditions 
systems with some thousands of important variables can be managed fairly reliably. 
But already the situation is qualitatively different: one needs highly trained and 
experienced specialists. 

The problem with computers is that they are absolutely qualitatively different 
from these two examples. This should be obvious: one can make a count of the 
million plus pixels on the screen, and then remember that we always want to know 
more than what's on the screen (that's why windowing operating systems were 
invented.) If one likes, one can also count sheep: the billions of bits in memory, 
trillions on disk, and quintillions on the net. A computer is not like a hammer, it 
is not like complex machinery, in fact it is not like any other tool that has been 
invented. 

9.2.4 Is Software a Mechanism? 

Software is often perceived as a kind of mechanism, a machine independent of 
both the software developer and the user, which functions or fails according to its 
own internal composition and laws. I want to highlight two particular ways that 
this perception manifests itself in real life. First, physicists may believe that the 
tasks they perform with computers are not really physics; and may imagine that 
they are simply using a tool ("thinking machine" still evokes the gestalt) to get 
a particular job done [126]. Because many believe that computing amounts to 
operating a machine and is certainly not physics, there is a tendency for physicists 
to not include discussions of software in their professional discourse and published 
articles, and to relegate computing tasks to graduate students. Second, the public as 
whole understands bugs as a sort of machine failure, a breakage. As a consequence 
of this analogy with machine failure, people incorrectly assume that bugs are the 
exception rather than rule, that it is possible to examine software before hand and 
see whether it is in good shape, that if software were engineered properly then it 
would be reliable and its developers would never have to make changes to it, that the 
culpability for any particular software failure can be traced to a mistake of some sort, 
and that very rudimentary descriptions of software failures should suffice for others 
to quickly identify and fix them. All of these beliefs accurately descibe ordinary 
tools and machines, and yet are absolutely wrong when applied to computers. 

I would like to suggest that real world software is not best understood as a 
machine or mechanism. While textbook algorithms may meet this definition, in real 
life software is better understood as a way of communicating, like scholarly discourse, 
literature, et cetera. In other words, it is far better to compare a computer to library 
than to a machine. The analogy between software and communication can teach a 
lot: 

• Software (meaning the source code) is written in a language. It is very impor- 
tant that software be understandable to humans, in order to assist both the 
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original programmer and future programmers in their efforts to understand, 
chectc, fix, and improve it. In fact, usually human assistance is required to 
make the software understandable (compilable) for computers, and the hu- 
mans are unable to assist without understanding the code themselves. One 
can argue that the most important audience for source code is not the com- 
puter but humans. 

• Careful observations of human speech shows that we have immense difficulty 
with speaking grammatically; it should not be surprising that writing correct 
programs is also difficult for us. 

• Words derive their meaning from their context; software as well works only 
within the context of being compiled, run by an operating system, receiving 
input, and possibly connecting with other programs or other computers. 

• Texts are can be considered valid or invalid only within a certain context, and 
in fact usually a dialogue is required before meanings become entirely clear. 
Software too is never entirely correct but is always subject to further critiques 
by new users. Thus a continueing relationship between developer and user is 
required, which usually is called bug fixes, new releases, or maintenance. 

• If one studies any text in detail, one will inevitably find unanswered questions, 

incompletely specified terminology, subtle or glaring inconsistencies, and per- 
haps even grave errors. Experience from the study of literature indicates that 
this process of discovery can go on for centuries. This is similar to software, 
where every program contains bugs, and the only certainty is that the longer 
software is studied the more bugs will be found. 

• Earlier I briefly described the huge amount of effort which is spent on recording 
every detail about software and its history. I did not mention the exacting 
and detailed analysis devoted to the software; teams reading through code 
together, detailed specifications, formal diagrams, test plans, amd exacting 
records of its behavior and failures. All of this is very reminiscent of the 
commentary and tracking devoted to certain texts. The most extreme cases 
are probably religious texts; for instance every manuscript of the Bible and 
the Talmud is tracked and compared character for character. Shakespeare's 
works, and other literary texts, are also the object of exacting study. There is 
also a parallel with scientific literature: every scientific article is permanently 
recorded in a scientific Journal, and a lot of attention is given to how articles 
comment on and build on other articles. 

In summary, I am suggesting that real world software is best understood as 
being subjective, not objective. By "objective" I mean having a reality external to 
and separate from the individual observer, the way that a hammer or machine is 
the same thing whether you put it in a parking lot or in the Sahara desert. I use 
the word objective in the sense of an external, separate, objectified entity, not in 
the sense of true-ness or false-ness. Something may be objective and yet false, for 
instance an incorrect theory. 

In saying that software is subjective, I mean that its meaning and validity are 
given to it by people through some sort of interaction with each other, and depends 
on context supplied by communities, communication, or personal experience. 
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9.3 Current Physics Practice 

I now attempt to give a picture of how the physics community manages the com- 
puting problems which I discussed. This picture is based on a survey of physics 
publications and software which I did in May 2003. I plan to redo the survey, 
this time extending its reach by surveying journal policies, and extending its thor- 
oughness by using certain new full-text search engines. Unfortunately I was able 
accomplish these plans before finishing this thesis. 

The physics community uses software in several different ways, each of which 
has its implications for reliability and reproducibility. Here are the most common 
usage patterns: 

• Operating systems. Utilities, Languages, and Scripting Engines. These re- 
sources usually have a wide user base extending far outside the physics com- 
munity. They often are compiled before distribution to users. 

• Scientific Programs. Many scientific programs have been made available for 
re-use, either as source code or in compiled form. These resources often have 
a restricted user base. I will differentiate two classes of scientific programs: 
The first class is large codes developed and used by a community on a con- 
tinuing basis; for example codes for fluid dynamics, linear algebra, and partial 
differential equations. The second class is codes that are written by a few re- 
searchers who stop development at some point and make their code available 
to the public. 

• Reusable Numerical Results. Often models are obtained through complex 
analysis procedures, and then are re-used without redoing the analysis. Ex- 
amples include nuclear potentials, pseudopotentials, and tight binding hamil- 
tonians. 

• Unshared Resources. Almost all research involves creation of resources that 
are not distributed to the public. These are widely assorted, but may include 
configuration files, programs, and scripts. 

9.3.1 Efforts to Ensure Reproducibility 

No survey data is available about the procedures used by individual physicists to 
ensure reproducibility of their computing results. However, two facts are clear: 

• Few, if any, individual research articles make available for on-demand down- 
load the complete set of files necessary to reproduce their results. Even the 

minority of research articles which do this usually do not distribute all the 
supplementary files needed to obtain the published graphs and numbers. 

• Physics research articles tend to not report version information about their 
computing environment. One can get a flavor for this by searching physics 
abstracts for the names of popular pieces of software. On May 15, 2003, 
searches of arxiv.org showed that only 35 physics abstracts [127] mentioned 
"linux," 57 [128] mentioned " mathematica," and 117 [129] mentioned "for- 
tran." Possibly even fewer articles report version numbers; for instance only 
eleven of the 35 linux articles provided version data, and many of them gave 
only partial data. 
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We can read between the lines. It is likely that most individual physicists do not 
track everything necessary for reproducibility. Probably most are unaware of the 
advantages and availability of source code control software. Without it, they are 
limited to making (partial) backups of their files. Even backups can be problematic 
in those institutions which do not provide recordable mass media and automated 
backups to their researchers. In these circumstances, it seems likely that many 
individual researchers will have difficulties giving directions to others about how 
to reproduce exactly their own research results. They may even have difficulties 
reproducing their own results themselves. 

Collaborations developing large codes often do better with reproducibility prac- 
tices. Most distribute one or more releases, each with clear and distinct version in- 
formation. For instance, LAPACK [52] currently offers release three to users, while 
MPICH [130] offers many versions of its compiled binaries. Many large projects also 
use source code control software. Some of these, like ATLAS [131], PETSc [132], 
and GSL [133], allow anonymous read-only access to all versions of the source code. 

9.3.2 Attempts to Reproduce Results 

There are some cases where many physicists do calculate the same physical quantity. 
For instance, there have been many lattice QCD calculations of meson and hadron 
spectra, and comparisons of them have been published. However, no single spectrum 
calculation, using all the same algorithms and constants etc, is reproduced by two 
parties. So it's basically a lot of people doing it different ways and getting results 
that have some similarities and some differences. 

Moreover, most published scientific results cannot be directly compared with 
any other result. I.E nine out often graphs are of non-standard quantities: different 
selection criteria, slightly different formulas, different parameters (energy of the 
probe, etc.), different approximations, and on and on. 

It is likely that there are also unpublished, undocumented attempts to reproduce 
results. The results of these efforts may be communicated by word of mouth, and 
errors may never be documented [134]. One reason for this could be a perception 
that publicizing an error and its correction would humiliate the original researcher. 
Alan Karp [135] gives an example of a conference where many groups simulating 
Cepheid variables got together and compared each group's graph of a single physical 
quantity. It could be that many of these unpublished attempts at reproducing results 
do not go through all the steps necessary to get exact quantitative agreement. 

9.3.3 Bug Awareness and Management 

The physics research literature does not discuss the reliability of its computing 
efforts. In particular, the possible existence of bugs is almost never acknowledged. 
On May 15, 2003, a search of all physics abstracts and titles in arxiv.org listed only 
four abstracts [136] containing the words "bug" or "software defect." (There is 
however, some evidence that bugs do occur: 54 articles [137] had attached author 
comments stating that the article had been revised because of a bug. Very little 
further data was supplied. Moreover, 928 records [138] contained the word "errata" 
or "erratum," and one must wonder how many of these were caused by software 
issues.) 

Unfortunately, the text of an erratum generally does not discuss how the error 
occurred. There is one exception where physicists explicitly documented the com- 
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puting failure that had caused an incorrect result, probably because of the result's 
extremely high visibility. On February 8, 2001, a collaboration at the Brookhaven 
laboratory reported a new measurement of the anomalous magnetic moment of the 
muon [139]. The result was very intriguing because it deviated from the theoretical 
prediction by 2.6 standard deviations, and because the collaboration had plans to 
quickly increase their experiment's accuracy, in which case the deviation could have 
reached 5 standard deviations [140]. If this had occurred, it would have been a 
convincing sign of physics beyond the standard model. A citation search on SLAC's 
SPIRES database shows that by the end of October 2001, 226 preprints had been 
distributed citing the Brookhaven result. Many of these papers were theoretical ef- 
forts to explain the discrepancy using physics beyond the standard model. It seems, 
however, that the original theoretical prediction (which relied only on the standard 
model) was not checked until October or so. On November 6, 2001, Knecht and 
collaborators distributed two preprints suggesting that the prediction should be re- 
vised [141,142]: a particular Feynman diagram contributing to the result had been 
given the wrong sign. A few months later one of the physicists responsible for the 
original predictions, Kinoshita, reported in some detail his collaborative effort with 
Hayakawa to recheck the calculations, and their discovery of the source of the sign 
error [143]. Hayakawa and Kinoshita write: 

"We noticed one crucial difference between Ref. 5 [Knecht and Nyffeler's paper], 
which leads to the positive value, and ours, which leads to the negative value; while 
Ref. 5 used the algebraic manipulation program REDUCE to perform the trace 
calculation of the 7 matrices, we used FORM instead. Recall that we have used 
FORM even for examining the results of Ref. 5. Thus we decided to check whether 
we handled FORM properly. The program FORM had been used successfully to 
calculate the QED corrections to the 5 — 2 of the muon and the electron as well as 
other observables by one of the present authors. However, this does not guarantee 
that we deal correctly with the e-tensor, the central object of our study of the 
pseudoscalar contribution. A simple test of this question is to see if our naive use of 
the FORM declaration (Fixindex 1 : —1, 2—1,3: —1;) works successfully to verify 
the identity 



which should hold in Minkowski space-time. Unfortunately, the result turned out 
to be -|-24. This means that this simple declaration does not work for the e-tensor 

in Minkowski space-time On the other hand, REDUCE passed the same test 

without difficulty." 

This incident merited a one page report in Physics Today [140], in which Bertram 
Schwarzchild explains that the FORM program's output of +24 instead of the ex- 
pected —24 was appropriate in the Dutch convention for the Levi-Civita tensor 
eaff-rS- (Schwarzchild mistook the author of FORM; it should be correctly attributed 
to Dutch physicist Jos Vermaseren [144].) This was nonetheless an instance of soft- 
ware unreliability, simply because there was a mismatch between FORM's behavior 
and Hayakawa and Kinoshita's expectations. (The question of whether the fault 
should be ascribed to FORM or to Hayakawa and Kinoshita does not affect the 
diagnosis of software unreliability.) The error could have been avoided if Kinoshita 
and collaborators had done more thorough testing of the software functionality that 
they were using. Better documentation, and reading it, can also sometimes help to 
prevent this sort of error. 

It is interesting to speculate how much longer the physics community would 




= -24, 



(9.1) 
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have required to discover the problem if Knecht and collaborators had used FORM 
instead of REDUCE, or had reacted to the psychological pressure to reproduce 
known results by "correcting" REDUCE's sign in some way. 

I now move on to other aspects of bug awareness and management. Despite a 
lack of survey data, it is safe to say that the majority of physics research efforts do 
not use any bug tracking databases to manage software problems. It seems likely 
that most physicists are unaware of the availability and utility of these solutions. 
However, some of the larger codes and programs probably maintain internal bug 
tracking databases that are not visible to the public. 

Almost all publicly available physics software relies on e-mail for bug reporting; 
larger programs and codes may have e-mail addresses dedicated to user support, 
while smaller ones typically use the author's e-mail address. Response to bug reports 
is rarely, if ever, guaranteed. 

Many of the smaller publicly available codes do not offer the public any informa- 
tion about their bugs; witness in particular the CPC Program Library [145], which 
appears to make no provisions for documenting bugs in the programs it houses. 
There are some exceptions to this rule, in particular those codes that are housed 
by sourceforge.net [146], which provides for each hosted project a bug tracker, a 
support tracker, a newsgroup, and a searchable archive of the newsgroup. 

Larger codes are more likely to document bugs: for instance LAPACK [52] has 
an erratum page, the Gnu Scientific Library [133] provides searchable archives of 
its newsgroups, and the APE supercomputer's operating system allows anonymous 
users to view its bug database. Among commercial packages, public bug documen- 
tation is spotty: for instance MATLAB [147] documents both bugs and their fixes 
on their web site, while Mathematica© [148] and GAUSSIAN [149] do not. 

9.3.4 Testing 

The physics research literature rarely discusses either software testing or the closely 
related term "verification and validation." Searches through arxiv.org in May 2003 
revealed that only 46 abstracts [150] contained both the words "software" and 
"test," while fifteen [151] contained two of following three words: "software," "ver- 
ification," and "validation." One may presume that some of these articles do not 
really discuss software testing. 

To my knowledge, the most comprehensive attempt in the physics community to 
ensure a code's reliability is an ongoing collaboration to test an astrophysical mod- 
elling program called FLASH [152]. This collaboration has adopted the methodology 
of Verification and Validation [121-124], which was developed mostly within the field 
of aeronautics, largely because of the immense pressure to produce reliable aircraft 
and space vehicles. It is also practiced in other situations where there is a lot of mo- 
tivation to get things right, like combat simulations, simulations at Sandia et cetera 
of nuclear bombs, and certain enviromental impact studies. [118,126,153-156] Very 
roughly speaking, verification corresponds to checking a program's reliability, while 
validation corresponds to checking that its predictions match reality. The FLASH 
collaboration is probably the first instance where members of the physics research 
community have adopted verification and validation techniques, and is a very ex- 
ceptional case. By and large the physics community worries much much less about 
the reliability of its computational results. 

Some of the larger publicly available codes do make test suites available to 
their users, although sometimes they may be hard to understand. For instance. 
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the popular LAPACK [52] mathematical library includes extensive performance and 
functionality test suites, and its recommended installation procedure includes run- 
ning these tests and reviewing their results. (However, the tests themselves are not 
well documented, so understanding test coverage or failures requires reading the 
test source code.) BLAS [157], the popular standard for fast matrix operations, 
also includes a test suite. (But it may not be well maintained - one well known bug 
concerning the ATLAS library had not been fixed several years after the introduction 
of ATLAS.) The MPICH library [130] for parallel programming also includes both 
functionality and performance tests in its source tree. Smaller codes seem much 
less likely to be accompanied by test suites. In any case, it is not known how often 
physicists run those test suites which are available. 

Software that is distributed in a compiled format usually does not include a test 
suite. For instance, MPICH's [130] precompiled distribution for windows does not 
include its test suite, and Mathematica does not distribute any tests at all. 

Since software distributed in a compiled format is rarely accompanied by test 
suites, it would be logical to do focused testing of those functionalities used in one's 
research. The physics research literature rarely if ever reports any such checking. 

In summary, physicists generally do not test their computing. There is one 
exception, which is the focus of physicist's effort: global validation of the final re- 
search results. Research articles often report a handful of checks of their principal 
results. Internal consistency may be validated by seeing whether the result seems 
to have converged and whether relevant conservation laws and sum rules are satis- 
fied. Results may be validated by comparison with rough approximations, previous 
calculations, and experimental data. However as a rule this checking remains at a 
very global level; this type of check is often called "system level testing" within the 
software development community. Moreover it is often done on a qualitative, non 
quantitative, basis: many physics publications ask the reader to evaluate the agree- 
ment of results by making an eyeball comparison of graphs, rather than supplying 
numerical measures of agreement. 

Physicists may do further checking before reusing code or numerical results from 
previous research within their field. Certain important numerical results tend to 
obtain the most scrutiny: some examples include the many papers reporting checks 
of pseudopotentials, tight binding hamiltonians, and fits to the nuclear potential. 
However, usually redistributed codes are used without any report of checking their 
validity. The exception to this rule is the research genre devoted to comparing 
several algorithms, though even here the code may be written from scratch, and 
comparisons tend to remain at the global level. 

Another sort of global check is peer review, which is one of the physics com- 
munity's most successful and important tools for maintaining its own professional 
standards. It is also the only reliability-oriented practice which is universally prac- 
ticed, and as such is invaluable. Here I make one point: peer review is generally a 
global check on the paper's results, not a detailed check of the computing used to 
obtain those results. This orientation is caused partly by the physics culture, but 
also partly by the fact that reviewers - often confined to the information contained 
in the text being reviewed - may not have sufficient information to try reproducing 
the calculations. 
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9.3.5 Journals 

In my survey of the physics community I have neglected to investigate the policies 
of physics journals. I received a few remarks about this from reviewers of the 
first draft. Hans Petter Langtangen [158] wrote: "If you look at the scientific 
traditions in experimental physics, you see that reliability is very much in focus. 
Scientific computing has much of the same nature as experimental physics, but 
in publications the quality standard of experiments are very much higher than for 
computational experiments. Scientific computing could simply adopt the standards 

of well-established experimental fields As a frequent referee I have systematically 

required high standards In publications based on numerical standards. I feel I usually 
have a good support from editors. The problem is that few editors are willing to 
require such standards in written form as a part of the description of the journal..." 

Stan Scott, the director of the Computer Physics Communications Program Li- 
brary, the only code archive that I'm aware of that is directly associated with a 
peer reviewed physics journal, also took the initiative to contact me [145,159]. He 
wrote that "One of the aims of CPC is promote basic good practice In scientific 
programming." He also shared with me the referee report form. Here are some 
excerpted questions posed to the reviewer: "Is the research underpinning the com- 
puter program sound? Is the computer program of benefit to other physicists or 
physical chemists, or is it an exemplar of good programming practice, or does it 
illustrate new or novel programming techniques which are of importance to some 
branch of computational physics or physical chemistry? Is the computer program 
well engineered and does it meet accepted standards for scientific programming? 
Does the manuscript make clear the structure, functionality, installation, testing 
and operation of the program? If you were supplied with the program did It Install, 
run and perform as described in the documentation." 

Before submitting the final version of this present chapter for publications In a 
peer reviewed Journal, I will do a more systematic review of physics journals. 

9.3.6 Summciry of Current Physics Practices 

It seems that while many physics articles use software to compute various results, 
perhaps few authors have implemented the most basic practices for ensuring its 
quality - whether planned and repeatable test suites, source code control, or pub- 
lication of their code, scripts, and configuration files. Moreover, there does not 
appear to be a structure for reporting bugs, documenting them, or discussing their 
prevention. 

Clearly, groups creating and maintaining large codes are more likely to address 
Issues of reproducibility and reliability. They often version their release, use source 
code control, distribute good documentation, and provide dedicated e-mails for 
software support. However, there are still notable deficiencies: a frequent lack of 
public documentation of bugs and fixes, and a lack of test suites for executables 
like Mathematlca. 

The situation is much different for smaller research efforts and for the physics 
research literature. The chief tools for insuring research quality are global qualitative 
checks of each paper's results: global checks done by the paper's author, followed 
by peer review. Neither of these practices is oriented toward detailed checking of 
computational results. In general, post-publication checks by other physicists are 
greatly restricted by the normal practice of not publishing the Information necessary 
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to reproduce one's results. Regarding normal software development practices for 
assuring reliability, they are usually not implemented: Bugs and testing are not 
discussed in the literature. Specifications, source code control, test suites, and bug 
tracking tools are not used. 

It seems reasonable to conclude that physicists regard their computing activities 
as peripheral to their research even when they publish the results of their computa- 
tions, that physicists "dont consider [reproducibility] to be an essential element in 
the quality of their work," and that they don't consider testing to be essential ei- 
ther [126]. Tim Trucano of Sandia writes [126]: "subject matter expertise in physics 
is far more cherished than CS [computer science] expertise and it shows." Neither 
my survey of the physics community nor my one and a half years of subsequent 
experience give much hint of any overall improvement in this picture. 

9.4 Problems for Physicists 

I now reconsider the computing problems presented in section l9^ in the light of the 
physics community's practices documented in section 1^31 We will see that that the 
two can combine to create difficulties for physicists and even significant delays in 
scientific progress. 

9.4.1 Reproducibility 

The sciences are, in varying degrees, hard or soft, and physics is traditionally the 
hardest of the experimental sciences. Hard science distinguishes itself by restricting 
its study to subsets of reality obeying two requirements: Firstly, it must be inde- 
pendent of the observer; i.e. the same independently of the observer's identity, and 
even of the existence of the observer. (Note that the notion of independence is 
given more precision in quantum mechanics, where the expectation value, not the 
wave function, is expected to be independent of a probe.) In other words, hard 
science studies only things that are objective in the sense I defined at the end of 
section ESI 

Secondly, hard science focuses on realities that are under so much control that 
they are reproducible: at will, by anybody, as many times as desired, and (if desired) 
with somewhat different parameters to see what changes. These two requirements of 
independence and reproducibility are not separate: if an experimental or theoretical 
result can not be reproduced at will by many different scientists, there is no way 
to determine whether the object of study is independent of the observer. Thus 
reproducibility, whether of experiment or calculation, is essential to hard science. 

Hardness is not a binary value, but varies continuously as the requirements of 
independence and reproducibility are relaxed. For instance in astronomy the stars 
are not in the observer's control and cannot be made to repeat their behavior. 
This causes certain systematic problems of both theoretical and practical natures. 
However, because the stars can be measured precisely by many different observers, 
and because arguments considering the set of all stars as a statistical ensemble 
work pretty well, astronomy is considered a hard science. On the other hand one 
can consider sociology and psychology, where reproducibility is impossible to obtain 
except in very restricted circumstances; these fields are therefore are considered soft 
sciences. 
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Nature is complex, humans are error prone and have limited resources, and 
therefore physicists usually do not take the time to reproduce others' results. When 
one reads a theoretical paper one rarely re-derives every single line from scratch. 
And most experimental results are not reproduced (with all parameters the same) 
by other teams. Typically the only reasons why a physicist would actually repeat 
somebody else's results are to learn the technique, or else because the result is either 
important or suspicious. And typically the goal of such repetition is not to produce 
an exact replica of the original experiment in all its details, but instead instead to 
obtain the same physical result [135,160,161]. 

However, there is a constant emphasis on trying to ensure that one's results can 
be reproduced by oneself and by others if desired. Within theoretical physics, the 
principal means for allowing reproducibility are three disciplines: 

1. One goes through formal derivations before publishing results. 

2. In publications one states one's starting assumptions, cites references or ex- 
plains in detail when a lesser known mathematical technique is used, explains 
at least the major steps of one's derivations, and clearly explains the final 
results and their applicability. 

3. Publications are submitted to peer review, and the referees are expected to 
do some degree of checking on the validity of the result. 

Along side these practices is the ethical requirement that one cannot keep the details 
of a proof secret [160]. A physicist who claims to have derived a particular result 
but is unwilling to divulge its derivation is simply not professional. 

The author is less familiar with the steps which experimental physicists take to 
achieve reproducibility, but suspects that they are more conscientious than theorists, 
keeping detailed daily records of their experimental design, actual configuration, and 
results, it is unusual to find theoretical physicists who keep similar daily records of 
their work. 

If the adoption of computers merely introduced one more difficulty in obtain- 
ing reproducibility similar to all the others inherent in the scientific process, then 
there would be little reason for concern. This is not the case. Introducing comput- 
ers makes a qualitative change in scientific activity: Without computers one has a 
reasonable hope that results can be reproduced with only the aid of a laboratory 
notebook or possibly a scientific article. Once computers have been introduced, 
there is no hope of obtaining reproducible results unless someone has used tremen- 
dous care to get the software right, and unless the original scientist has kept detailed 
records of the settings and configuration used to obtain her results. And if she has 
not not kept detailed records of the final results, there is no way to tell whether one 
has succeeded in reproducing them. 

If research is not reproducible, it's not hard science. It's soft science. I am not 
asserting that any particular results in modern physics are not reproducible, but am 
rather saying that there is a substantial risk that many of them are not reproducible. 
This risk is qualitatively different than the unavoidable, unchanging risk that some 
individual results may be in error. Instead I am discussing the risk that many of the 
physicists who have adopted computers have not adopted the measures necessary 
to obtain some chance of reproducibility, and therefore are consistently obtaining 
and publishing unreproducible results. This risk may cause a qualitative change in 
the discipline as whole, away from hard science and toward soft science. Such a 
change could be easily interpreted as a change for the worse. 
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I would like to distinguish two specific negative consequences of un reproducibil- 
ity. First, the physicist who has not ensured the reproducibility of her results simply 
does not know how she obtained them; and therefore is unable to check their valid- 
ity. Publication of such results is equivalent to publishing a theorem without being 
willing to divulge its derivation (or even being sure of the derivation oneself.) It is 
simply unprofessional. 

A second consequence of unreproducibility is that there is no way to compare 
results or establish trends. Computers are extremely sensitive to their inputs; if one 
is not certain of the precise details of how a calculation was done, one can't be sure 
of even the orders of magnitude of the result. Researcher A obtained one result, 
and researcher B obtained another result, perhaps the same, perhaps different. 
Were they computing the same quantity, and if not how should the two quantities 
be related? Assuming that they are the same quantity (usually not true,) should 
one have expected the two results to be the same, or different? If different, what 
differences would one have expected? Were either of the two results obtained by 
a process without any significant mistakes or bugs? If the two results differ, is the 
difference significant? If the two results are the same, does that mean that they 
agree? None of these questions can be answered if either result is unreproducible. 
I am not saying that repeating these computations is necessary, but rather that 
knowing what you did is necessary to answer these questions. Where irreproducible 
computations are concerned, one cannot begin to discuss whether the "physical 
results" are in agreement with somebody else's results, because one really doesn't 
know what one's results represent. 

It is also impossible to extract meaningful information by finding trends in collec- 
tions of unreproducible results. It is well known in the software industry that people 
will fix software until it meets their subconscious or conscious expectations, whether 
or not the final result is correct [120,135]. This is not acting in bad faith; it con- 
sists simply of looking at unfamiliar results and saying "That's strange. It can't be 
correct." And then once one finds a result that looks half way right, one concludes 
that one has fixed all the problems. This process may be especially exaggerated in 
the physics community, which relies almost exclusively on global qualitative checks 
for validating its results. Therefore one could easily imagine a dozen different re- 
searchers unwittingly reproducing roughly the same incorrect result. And is no way 
to detect such an error without reproducibility. 

Related is the difficulty that at times physicists are forced to make choices which 
they know will reduce the objectivity of the experiment. For example, they may use 
software to deliberately throw certain "uninteresting" events away, and thus run 
the risk of new systematic errors. In such cases there is the constant risk that the 
research results are an artifact of the physicist and her process. Often scientists try 
to cope with this issue by doing monte carlo simulations of the cut process. Such 
simulations can be useful for estimating the effects of a correctly peformed cut on 
a fully understood experiment. However they do not alleviate systematic risks that 
the cutting software did not behave as expected or that the uncut data is not fully 
understood. In fact such risks are compounded by the possibility of errors in the 
monte carlo simulation. These risks cannot be managed if one doesn't know how 
to reproduce and re-analyze the the cutting algorithm and monte carlo calculation 
which were used to arrive at the published results. 

Some may discount these arguments for reproducibility because "the object is 
to produce new results" [135], or because having a variety of different results that 
can't be compared is a sign of vibrancy and creativity. While the Nobel prize 
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generally is not given for reproducing a result, people have been awarded it for 
obtaining and expressing a very accurate and deep understanding of experimental 
results or of theory. This reflects the fact that hard science is about obtaining some 
sort of knowledge about the universe. We do this by checking and rechecking our 
results against reality, thinking it all over in mind-numbing detail, participating in a 
community which is both collaborative and mutually correcting, and thus gradually 
adapting ourselves and our thinking to the reality we meet in our experiments. Hard 
science is precisely the extreme where the scientist is most stringently required to 
conform and adapt to reality, where we stand or fall by how well we understand or 
model reality, not by how we change or influence it. It is precisely the fact that we 
are trying to understand an external reality that forces us to compare results, and 
to insist on reproducibility. We could do without reproducibility, but then we would 
not be doing hard science. Our results would be suitable only to remaking nature 
according to our wishes, not to understanding it as it is. 

Others may deny the importance of reproducibility for personal reasons: fear 
of public criticism and negative effects on one's career, desire for the power and 
influence that comes from possessing a code that is in demand, fear that others will 
rip off one's work without giving credit where credit is due, the hope of being able to 
barter or demand payment. These concerns all have a certain validity, but they are 
not in tune with the scientific vocation, or with the best interests of the scientific 
community. Some of these concerns are not an issue if the scientific community has a 
proper respect for computing; it should respect the fact that even conscientious and 
professional researchers use methods that are not nearly picture perfect and produce 
results that not bug free. It should also give to authors of reused files and code the 
same honor and citations that it gives to authors of seminal papers. On the other 
hand, if one wishes to profit from research by retaining possession of its results, then 
one should consider one's research to be private enterprise. Therefore one should 
not publish results obtained using the software which one wishes to retain control 
of, for such results are simply advertisments of this intellectual property under the 
guise scientific articles. And, of course, one should obtain funding for one's private 
enterprise through channels other than those serving the physics community. This 
last sentence may apply to one's salary if one's current employer is not friendly to 
the idea of devoting one's working hours to obtaining supplementary income. 

A last point: ensuring reproducibility may help the author herself some years 
later [160]. Alan Karp [135] recounts: "My postdoc involved some calculations 
relevant to supernova models. I carefully backed up my work to tape before leaving 
for my new job. Eleven years later, a supernova went off in the Magellanic Clouds, 
and I was asked to redo the calculation for a model appropriate for this case. I 
contacted my former employer to get a copy of the program, data, etc. and was 
told that their records retention policy deletes all information after 10 years. I 
proceeded to rebuild the application from assorted printouts that I had kept, but 
they were from a number of different versions. The program ran and produced 
plausible output, but I could not reproduce the results I had published. I believe 
that the problem was inconsistent data sets, but I couldn't be sure, so I never 
published the results for the new supernova." 

9.4.2 Complexity 

The apotheosis of independence and reproducibility is physical law: rules that are 
claimed to hold constantly, throughout time and space, regardless of the existence 
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of observers. These are the focus of physics. Physical laws, by definition, are 
simple: they can be stated clearly, in a few equations, without exceptions. Their 
consequences may not be simple at all, but often the study of these consequences 
is considered to be not physics but applied science: chemistry, engineering, etc. In- 
deed, simplification is a hallmark of traditional physics: choosing simplified problems 
to analyze theoretically, choosing experiments that are conceptually very simple, and 
using approximations to force simplification. Everything is finite: experimental ap- 
parati can be described fully by engineering schematics, and equations can still be 
expressed on paper. 

The introduction of complex computations to the scientific process can easily 
dilute or end this simplicity. One moves quickly from the intellectual purity of scien- 
tific principles and physical laws into the realm of "model building." Even conceding 
the conventional wisdom that all scientific laws are simply models, one cannot deny 
that the qualitative experience of verifying/falsifying a simple physical law like the 
blackbody spectrum is worlds apart from the experience of verifying/falsifying a 
complicated computational simulation of reality. In one case you are making a sin- 
gle decision about the validity of a simple entity, and the result can often be reduced 
to one bit: either the law has been falsified or it has not. In the other case you 
are making a huge number of comparisons and then using some rule to average 
their results. The results (plural!) will not be phrased in terms of falsification but 
instead as differences of the averages, and probability distributions of the differences 
between averages. 

Complexity and computers do not always end in this scenario. For instance, 
physicists were able to verify the existence of the top quark. They expended a 
huge amount of effort to reduce immense data sets down to a bump on a graph, 
and made an estimate of the probability that the bump was really there. Once the 
probability became large enough they began to say that the top quark exists. They 
found the needle in the hay stack. 

When the object is to study the hay stack rather than the needle, do you still 
have physics? In other words, when you are not trying to verify or falsify a particular 
point but instead are modeling or measuring all the details of a complex system, are 
you practicing physics or some other discipline? I am including collective behaviors 
like asymptotic freedom, collective resonances, and the fractional quantum hall 
effect as examples of needles because each of these individually can be verified or 
falsified. This question is not irrelevant: its answer defines the identity and mission 
of the physics community and the individual physicist. Having a clear identity and 
mission is key to success. 

Another more tangible problem is our fear of drowning in information. This fear 
often leads us to throw information away instead of archiving it for future reference, 
examine and fit only certain "representative" quantities rather than attempting to 
study the whole data set by distilling it with appropriate mathematical techniques, 
make visual comparisons between graphs when rigorous mathematical measures of 
agreement are available, and abandon attempts to verify/falsify in favor of pictures, 
graphs, descriptions, and numbers. In short, we do a qualitative, intuitional science. 
Some of these practices may be acceptable on a short term basis. This is a problem 
when we accept this state of affairs on a permanent basis and no longer aim for 
a science which is independent from the observer, mathematically rigorous, and 
conceptually simple. 

A more constructive approach to complexity is to systematize and manage it. 
Powerful software for managing the information glut is available both freely and 
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commercially, most notably source code control software and relational databases. 
Other software packages are specially designed to simplify and automate complex 
mathematical manipulations. Costs of media for archival purposes are constantly 
descending. A wide variety of both mathematical formalisms and also algorithms 
for analyzing complex data sets is under continuous development. If a physicist 
feels drawn to use shortcuts like those listed in the previous paragraph even on a 
temporary basis, probably the better choice would be to fight the impulse and either 
adopt or invent rigorous solutions like the ones I just listed. 

9.4.3 The Scientific Method and The Scientific Commu- 
nity 

The scientific method, as taught in textbooks, is a prescription for developing knowl- 
edge by repeated confrontations of theory with experiment. The idea is that close 
observation of experimental reality and painstaking thought about it can, over time, 
provide a framework for discernment about which ideas are more or less faithful to 
reality. Without experiencing the shock of clear disagreement with experiment, 
scientists might never make the hard choices needed to obtain theories which are 
faithful to reality. 

In order to obtain a clear confrontation between theory and experiment, the 
following conditions must hold: 

• the experimental result be clear and unambiguous, 

• given enough expertise, intelligence, and time, a unique prediction can be 
derived from theory, and 

• a confrontation may be made between the experimental result and the theo- 
retical prediction, obtaining a clear validation or refutation of the theory. 

In reality these conditions do not always hold. A large portion of the scientific 
practice is devoted to design and refinement of both experimental and theoretical 
techniques which enable a clear confrontation. 

The current era's scientific and technological successes are often explained with 
the scientific method. This account ignores the crucial role of the scientific commu- 
nity, which is likely more deserving of credit for progress than the scientific method. 
A full understanding of the reasons for scientific progress must rely on not just 
philosophical arguments but also historical and sociological studies of the scientific 
community [162]. 

The scientific method (clear confrontation of experiment and theory) is at risk 
whenever either theoretical predictions or experimental results are unreliable, am- 
biguous, or very complex. I have already discussed the complexity problem. It 
should also be clear that unreproducible calculations are completely ambiguous. 
Here I concentrate on the problem of computational unreliability, and consider only 
the problems it can cause for further scientific research. I classify these problems 
into two categories: Small Scale errors and Grand errors. 

Small scale errors require only man-months to man-years to correct. They are 
confined to research results that are not obtained at great expense and are not 
widely cited before being discovered and fixed. If one could not avoid most of them 
by using basic software disciplines, then they might well be construed as part of the 
cost of doing physics. 
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Grand errors occur when an erroneous result is widely adopted, and can re- 
quire years to decades to correct and have inestimable negative consequences for 
the scientific process. I here confine the discussion to grand errors associated with 
unreliable computations. These can be perpetuated through a kind of chain reac- 
tion. Once an erroneous result is known, people will have incorrect expectations for 
further results. As I explained in section 19.4.11 these incorrect expectations can be 
expected to subtly determine the results of new calculations. As long as the physics 
community relies solely on global tests to check its research results, it has little 
defense against these chain reactions. In fact, it is impossible to ascertain whether 
any of these computation-related grand errors are currently in progress, simply be- 
cause the physics community does not conduct a systematic search for them using 
effective testing techniques like those alluded to in section 15.2.21 Software bugs 
will not be found unless they are searched for. It is also very hard to tell whether 
small scale errors are happening, because there is neither an effective framework 
for finding them nor a system for reporting them. When an error is detected an 
erratum may be published, but errata very rarely document how the error occurred. 

If grand errors could not be avoided by adopting a discipline of thoroughly 
checking every computational result, then perhaps the cycle of their entrenchment 
and eventual correction could be interpreted as simply a realization of Kuhn's ideas 
about normal physics vs. paradigm shifts [135,162,163]. This hypothetical pos- 
sibility is purely academic: grand errors can be avoided by taking steps to ensure 
reproducibility and reliability and to manage complexity; therefore any grand errors 
that occur would signal collective failures of scientific discipline. Nonetheless, I want 
to point out a crucial mistake in such interpretations. Results obtained using soft- 
ware that has not been tested thoroughly have little connection with either theory 
or experiment, and permit no confrontation between the two. Therefore neither 
the scientific method nor the scientific process as described by Kuhn provide any 
insurance against grand errors. Unless, of course, one is correctly managing the 
computational risks. 

Physicists may be tempted to discount the possibility of either small scale or 
grand errors, and claim the right to be a bit casual about data and computations. 
They may justify this attitude of presumed invulnerability with claims that they can 
tell incorrect data/computations when they see them [163]. Such an ability would 
have to be intuitive, because it is used to justify avoiding a quantitative or systematic 
approach to error detection. Physicists may also claim that the peer review system 
can be relied on to find errors. Because the peer review process relies on a subset 
of the global checks practiced by individual physicists, relying on it alone to detect 
errors just amounts to a re-assertion that physicists can tell incorrect results when 
they see them. Or physicists may claim an elite status and argue that this makes 
them less vulnerable to error, which again reduces to the same argument. 

A reliance on intuition, at the loss of discipline, is not fitting in any professional, 
no matter how experienced. Moreover, a correct professional intuition does not 
come from a vacuum, but is built on a foundation of a lot of training, experience, 
and discipline. One must question whether such training and experience are present 
when the typical physicist has little or no knowledge and experience of the basics 
of managing computing risks. Moreover, even experienced software development 
professionals are constantly over-optimistic about software; this is precisely why 
they have adopted systematic and exacting practices. Human intuition ALWAYS 
fails when evaluating how unreliable software is; it is always overoptimistic, and the 
only way to get a true knowledge of your software is to methodically test it. 
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I repeat my overall point: the traditional practices for ensuring the success of 
the physics community, including but not limited to the peer review process, can be 
expected to fail both on the small scale and on the grand scale if proper steps are 
not taken to manage computational risk. 

It is possible that by good luck some grand errors have already been detected, 
but are not widely known. Leo KadanofF [164] writes about a possible candidate 
which started in 1992 and lasted about five years: "The whole early history of sin- 
gle bubble luminescence required step by step work to eliminate provocative but 
incorrect mechanisms. A set of early experiments by Barber et. al reported very 
short widths for the emitted pulse of light. This short width then opened the door 
to novel mechanisms for explaining the total intensity of the emitted light. Later 
developments suggested that the short pulse width was a misstep by the experi- 
mentalists. In contrast to the excellent work on sonoluminescence in the post-1997 
period, reported above, this misstep led the simulators and theorists quite astray. 
A host of incorrect speculations and mechanisms ran through the field, intended 
to explain the 'observed' behavior. Despite one essentially correct simulation, the 
pre-1997 simulations did almost nothing to weed out these incorrect discussions, 
undercutting ones[sic] hope that simulations might provide a good tool for such 
weeding.... Instead the speculations continued unhindered until an experiment by 
Gomph and coworkers showed that the pulse width was much longer than previously 
believed. This implied a lower temperature for the emitting drop. After this, atten- 
tion turned away from the incorrect mechanisms so that- as reported above- theory, 
experiment, and simulation began to produce a consensus about what was going 
on. The examples of the Oak Ridge paper and some of the earlier sonoluminesce 
simulations suggest that the models might have been directed toward the wrong 
goals. Apparently, rather than being used for a process of checking, criticism, and 
elimination of incorrect possibilities they were often used to support and exemplify 
the presumptions of the scientists involved. A program of modelling should either 
elucidate new processes or identify wrong directions. Otherwise, there is no point 
in carrying it out." 

I have not studied the papers which Kadanoff cites, and can only comment 
on the incident as described in his report. He does not say whether the original 
experimental misstep could have been caused by a failure in some computerized 
experimental control or data analysis. If so, the whole five year incident would be due 
to a failure to manage computing risks properly. Even if the original experimental 
error had some other cause, one is still left to conclude that proper computing 
discipline could well have ended the incident much more quickly. 

in any case this incident seems relatively small. It remained confined to one 
narrow discipline, and lasted only five years. A citation search on the ISI Web of 
Science [165] indicates that only fifty-six articles cited the original erroneous result 
before Gomph et al published their correction. One could easily imagine worse; 
for example, a computationally caused experimental error in a cosmological exper- 
iment like COBE [119] could profoundly affect our understanding of the universe 
and the development of particle theory, or a string of erroneous theoretical compu- 
tations could lead the high Tc research community to either erroneously consider 
an incorrect mechanism for superconductivity or abandon a correct mechanism. 
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9.5 Consequences 

So far I have discussed risks of a rather esoteric and intellectual nature: the risk 
that a published result may be unreproducible, the risk that a published result may 
be wrong, the risk that someone's reputation might suffer, the risk that physics may 
not progress. Another risk - of ethics and professionalism - may also seem debatable 
or inconsequential to some, although I would contend that it is the most serious of 
all. 

There is another another set of risks which few would care to debate, and which 
I here call consequences to emphasize that they are cannot be considered esoteric 
or inconsequential in any sense^. Here are some of the possible consequences for 
the physics community as a whole of not taking steps to manage computing risks: 

1. Wasting large sums of other people's money. 

2. Difficulty in obtaining increases in funding, or even maintaining present levels 
of funding. (Once bit twice shy.) 

3. Leading society as a whole or smaller groups in wrong directions. 

4. Receiving less respect from others than we might, or even losing others' re- 
spect. 

5. Losing credibility. 

6. Not being able to influence (national) policy makers or society. (Why should 
anyone pay attention to advice that comes from unreliable computations?) 

7. Failing to speak with authority to issues of grave importance to real people. 
(Authority would of course include knowing exactly what the possible errors 
are and keeping them to an absolute minimum.) Topping society's priority 
list (not necessarily ours) are things that can make a life and death difference: 
global warming and environmental concerns, cryptography, the reliability of 
potentially deadly machinery like planes and nuclear power plants, advances 
in technologies that can prevent or cure illnesses, and military technology. 
Also at the top of the list are economic concerns, for instance advances in 
solid state (especially semiconductors), high temperature superconductors, 
unreliability of power grids, et cetera. 

We must remember that our honored place in society wasn't given to us as a God- 
given right: our predecessors earned it by their contributions to World War II and 
kept it through their continueing contributions to industry, most notably in solid 
state. We will not retain our status or support if we do not continue to make new 
contributions of similar importance. If we can't contribute, we are " unimportant on 
the overall scale of things (compared to war, pestilence, economics, etc.)" [126] 

These are the real consequences of poor choices about computing. It is interest- 
ing that in the software industry the care and expense devoted to getting software 
right is directly related to the possible severity of retribution from clients. Proba- 
bly the most care is given to flight control systems. Farther down the list comes 
vendors of enterprise software, where data and services are worth billions of dollars 
and individual software contracts involve millions of dollars. Last on the list come 
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the developers of software marketed to the individual user. They issue a notice with 
each copy of the mass produced software, with text very similar to this text, taken 
from the GNU copyleft (the caps are not mine): 

"EXCEPT WHEN OTHERWISE STATED IN WRITING THE COPYRIGHT 
HOLDERS AND/OR OTHER PARTIES PROVIDE THE PROGRAM "AS IS" WITH- 
OUT WARRANTY OF ANY KIND, EITHER EXPRESSED OR IMPLIED, IN- 
CLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MER- 
CHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE ENTIRE 
RISK AS TO THE QUALITY AND PERFORMANCE OF THE PROGRAM IS 
WITH YOU. SHOULD THE PROGRAM PROVE DEFECTIVE, YOU ASSUME 
THE COST OF ALL NECESSARY SERVICING, REPAIR OR CORRECTION. 

12. IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED 
TO IN WRITING WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY 
WHO MAY MODIFY AND/OR REDISTRIBUTE THE PROGRAM AS PERMIT- 
TED ABOVE, BE LIABLE TO YOU FOR DAMAGES, INCLUDING ANY GEN- 
ERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING OUT 
OF THE USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT 
LIMITED TO LOSS OF DATA OR DATA BEING RENDERED INACCURATE OR 
LOSSES SUSTAINED BY YOU OR THIRD PARTIES OR A FAILURE OF THE 
PROGRAM TO OPERATE WITH ANY OTHER PROGRAMS), EVEN IF SUCH 
HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE POSSIBILITY OF 
SUCH DAMAGES." [7] 

Who are the physicists' clients? Are they other physicists? Then probably 
physicists have even less to fear from their clients than do developers of software 
marketed to the individual user. Do physicists want this sort of disclaimer to apply 
to their research results? It will be fascinating to see how the physics community 
chooses to answer this question. 

9.6 Recommendations and Resources 
9.6.1 Professionalism 

I believe that a proper response to the risks of computation starts with recognizing 
their severity and gravity, choosing to regard computation as an integral part of 
one's research rather than as an inconvenient add-on, and developing a deeper 
professionalism and scientific discipline. Each of these steps can be summed up in 
one word: professionalism. Professionalism implies the following steps: 

• Choose to recognize and pay very close attention to the full extent of the 
risks inherent in computing. This includes cultivating a professional scepticism 
about the results of computations, whether done by oneself or by others. Such 
scepticism should be satisfied only when an author gives cogent and thorough 
arguments for why her result should be trusted. 

• Consider one's computing activities, whether writing software or using it, to 
be just another part of one's physics reasoning and research, on par with 
deriving equations or doing experiments. If one would prefer to not give the 
same interest and attention to one's computations that one gives to one's 
equations or experiments, then the appropriate choice would be to minimize 
the computing in one's research and to absolutely avoid publishing any results 
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that depend on some computing step for their validity. For example, this 
would include avoiding publishing theoretical formulas derived using symbol 
manipulation softw/are like Mathematica and Maple unless one is willing to 
check the results thoroughly and to keep exact records of the commands 
and input that were used to obtain them. The choice to minimize or avoid 
computing can be a valid choice; it is simply a choice of specialization. 

• The rest of this list will presume that the reader is not willing to forgo compu- 
tation. At the risk of stating the obvious, I remind such readers that publishing 
a result implies that one is confident that the result is not in error, and also 
implies that one has obtained this confidence only through a detailed and 
thorough checking process. In other words, an author must take full responsi- 
bility for her published results. Unfortunately it can be very tempting to make 
exceptions for computational results; to blame others for having written buggy 
software, to claim that software risks do not need to be evaluated or mini- 
mized because they are implicitly acknowledged whenever a computational 
result is published, et cetera. Therefore I encourage the reader to remind 
herself of her responsibility for the correctness and reproducibility all of her 
published results, including ones obtained with the help of computational re- 
sults or software supplied by another party. This implies choosing methods 
which have the minimum vulnerability to computational risks and recording 
all the information required for reproduciblity. It also implies following the 
testing guidelines briefly sketched in section 19.2.21 and doing detailed cross 
checking of the results that one publishes. 

• In your papers, give a balanced, professional evaluation of your results. You 
should know better than anyone else the things that could invalidate your 
results, as well as the uncertainties; document these in your papers. A pro 
and con approach may be helpful. 

• Adopt Buchheit and Donoho's statement [166] that "An article about com- 
putational science in a scientific publication is not the scholarship itself, it is 
merely advertising of the scholarship. The actual scholarship is the complete 
software development environment and the complete set of instructions which 
generated the figures." Make all your files freely available to those who want 
to reproduce your results, or to build on them. This can be done either on a 
per request basis, or on a charge-free public server like the Computer Physics 
Communications Program Library [145], the SourceForge web site [146], or 
Mathematica's archive [167]. One should attempt to make one's scripts, code, 
and configuration files be well commented and readable. One should also en- 
sure reproducibility on one's own computer and give some degree of assistance 
to would-be users. However in many cases one should not feel obligated to 
write much documentation, provide all the support necessary to make one's 
configuration files, scripts, or software work on another's computer, or explain 
how to do calculations. If there is a real concern on the part of the scientific 
community about the reproducibility of one's results, that is a different matter 
and full support for the community's efforts should be supplied. 

• Manage your risks. Do not feel that you have to achieve 100% mathematical 
proof certainty about the validity of a computational result; this is impossible. 
Do analyze the risks and build a convincing case for your result's validity. Risk 
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management is a well developed discipline [168]; but perhaps it can be boiled 
down to analyzing both the risks and the costs and benefits of mitigating 
them, planning and execution of appropriate steps to lessen them, doing the 
cross-checking necessary to figure out when a risk has become a reality, and 
fixing failures in a timely fashion. Obviously the extent of one's efforts to 
avoid errors will depend on the situation [155,163]. Nonetheless it would be 
very unprofessional to use situational arguments to justify either avoiding risk 
analysis and management (perhaps on the grounds that they are too costly) 
or publishing a result that one is not sure of. 

• Software always contains bugs; one can expect that even the most conscien- 
tious researcher will publish results that contain mistakes and inaccuracies. If 
a physicist has worked conscientiously for reproducibility and reliability, then 
her errors should not be taken to reflect on her quality as a scientist. In- 
stead of hiding or ignoring one's published mistakes, one should try to find as 
many as one can and inform as wide an audience as possible about both the 
mistakes and their corrections. The search for errors will necessarily include 
supplying contact information with the original article and following up in a 
timely fashion on any reports of specific errors. Informing the public will in- 
volve both documenting the error on the Internet as soon as it is confirmed 
and publishing an erratum in the original Journal. 

9.6.2 A Technical Toolbox 

• The way to master complexity is not to throw information away but instead 
to use software designed for archiving and managing data. This very powerful 
strategy allows you to find out the things you need to know when you realize 
you need to know them. In particular, source code control software is ideal 
for keeping track of files that can be read by humans and are subject to 
revision. It may also be useful to learn to use data bases, particularly if one's 
calculations involve many steps or process or generate a lot of data. One can 
obtain a reliable and free data base or source code control program on the 
Internet. 

• Sometimes mathematics programs like Maple, Mathematica, FORM, PARI- 
GP, Matlab, and Scilab can can tame complexity [169]. However, like any 
other software, the validity of their results should not be assumed, especially 
in the cases where test suites and bug databases are not freely available to 
the public. 

• Wherever possible, simplify. Given a hard problem, abstract it into simpler, 
toy problems, and understand them thoroughly before attacking the original. 
Simple programs are usually better. 

• Entertain doubts about the practicality of the reductionist program where 
everything is a sum of its parts, and if only I could calculate the problem 
at infinite resolution I would get the right answer. The last fourty years of 
physics gives a very clear indication that this is often not true. Moreover, as 
we have seen the computational approach is problematic. Spend extra time 
looking for other ways of modelling. 
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• A computer program should, if at all possible, just refine a result you can 
estimate independently. Or at least it should give known results in some 
limit. It is very hard to justify trusting a calculation that one has no way of 
checking. 

• Whenever possible, reuse scientific codes written by experts; many of these 
are freely available on the Internet. But test them too. Don't believe in magic 
bullets; to single out one of the most recent ones, open source code that is 
not thoroughly tested and is not widely and thoroughly used is not likely to 
be more reliable any other untested software. 

• If at all possible, use scripts and configuration files to automate all the steps 
necessary to obtain your computational results. Most software does allow 
scripting, whether with batch files, with macros, or with similar technologies. 
The reason for automating a calculation is that it simplifies reproducibility 
down to two tasks; (1) keeping archives of all the necessary files, preferably 
using source code control, and (2) recording the software configuration, in- 
cluding the version of the operating system and of any other software packages 
that were used. If unable to record all the necessary information in scripts and 
files, conscientiously record every detail in a lab notebook, keeping in mind 
that even the most conscientious human has immense difficulty achieving re- 
producibility without technical aids like scripts and source code control. Even 
when archiving everything, a lab notebook can be very useful anyway. 

• Upon publishing a result, make a frozen copy of all the files that were used to 
do the calculation, and leave the copy unchanged as it was upon publication. 

• Keep regular backups, including the the source code control database. 

• Consider putting your files (software, settings, scripts, tests, and documenta- 
tion) in an public open source repository. Consider licensing them with the 
GNU Public License [7], which will allow them to be freely reused by the 
public. 

• Consider practicing the discipline of producing "fully reproducible research." 
This consists of automating the entire process of creating your papers (com- 
putations, graphing, and type setting) and then making the whole package 
available to the public. [38,166,170-174]. 

• When checking software or its results, try to do it an automated way, by 
writing test scripts or test programs. Then you can test each version of your 
input files and the software in an automated and reproducible way. Run tests 
repeatedly as you make changes (even minor ones) to the calculation. If 
unable to automate the tests, create detailed records of each test and its 
outcome each time that it is performed. 

• Inform yourself about good testing practices [120, 175]. 

• Inform yourself about the basics of numerical analysis [50, 176, 177]. 

• Inform yourself about the field of verification and validation [121-124]. 
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• Do not assume that errors are uncorrelated or governed by a gaussian distri- 
bution. Rob Easterling [156] writes; "Never assume that probability distri- 
butions are exactly known. Include an analysis of the effect of uncertainty 
in assumed distributions. Know the difference between random variables and 
imprecisely estimated unknown constants and perform analyses that reflect 
this difference." 

• When one uses a computer to solve a mathematical problem, often the chal- 
lenge of estimating the accuracy of the computer's solution via an analysis of 
the solution technique turns out to be even more challenging than solving the 
original problem. There is a growing field called uncertainty quantification 
which is devoted to estimating software accuracy without looking at all its 
internals [178-184]. 

9.6.3 For Collaborations and Recipients of Research Grants 

When more than one person is involved in a computing project, or extra funds are 
awarded to it, its intrinsic challenges are more difficult, and the obligation to not 
risk delays or failure and waste time and other people's money is even more bind- 
ing. Therefore the researchers should go beyond the professional standards outlined 
in section 19.6.11 and adopt the standard reliability and reproducibility disciplines 
which are standard in software engineering. These include many of the technical 
recommendations in section l^.6.21 but also includes going through a whole software 
development cycle: 

1. Finding out the requirements of the prospective users. 

2. A design stage which produces a detailed specification of the software's ar- 
chitecture and implementation. 

3. Writing software, test suites, and documentation concurrently. Design changes 
should be reflected in the specifications. 

4. Making the code available to customers before the final release. 

5. A period of testing, client input, and stabilization. 

6. Ongoing bug fixing after the release. 

Obviously how this process is implemented will change a lot between a two-person 
low budget project and one involving dozens of people and millions of dollars. How- 
ever the rudiments of this process should be there in all cases. 

It may be appropriate to adopt an interdisciplinary approach and employ software 
experts to contribute their expertise to the project. Remember that in scientific 
computing both physics expertise and computing expertise are needed; preventing 
physicsts from coding would be a mistake if they are willing to practice the necessary 
discipline. Even in large projects it could be sufficient to let physicists do the 
whole software cycle, as long they are advised by and held accountable to software 
professionals who review their work in detail. 
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9.6.4 For Universities and Research Institutions 

• Consider computing an integral part of physics research. Require that re- 
searchers practice the professionalism described in section l^.6.11 Reward them 
for doing so. Strongly encourage reserachers to make their files freely available 
on the internet; require that at the very least they be made freely available 
on a per request basis. 

• Educate researchers about the difficulties of computing, professional discipline, 
and best computing practices. Foster and reward expertise in software testing, 
managing complexity, numerical analysis, verification and validation, software 
design, coding skills, scripting, triage, and risk management. Task particular 
staff with supporting and educating the researchers who are involved with 
computing. When possible, create scientific computing interest groups. 

• Provide a source code control server that gives researchers the option of 
making their projects visible on the Internet or instead keeping them private. 
Provide a public bug tracking web server where researchers can promptly 
document errors and corrections to their published results. Back up these 
servers, and provide additional backup services that are easy to use, able 
to store all of the researcher's files, and can be automated to run regularly. 
Allocate staff part time or full time to maintaining and troubleshooting these 
servers, software, and procedures. 

• Pressure vendors of commercial scientific software to distribute comprehensive 
test suites with their software, and to publicly document both their test suites 
and each bug report, confirmed or not. Vendors should also create a web site 
where users can communicate to each other the bugs they run into and the 
fixes they find without first verifying these bugs or fixes with the vendor. 

• Ensure that physicists are honored and rewarded when their computing con- 
tributions are adopted and reused by others. Determine whether such reuse is 
occuring and evaluate the quality of the contribution by soliciting information 
from users, not by asking the author. Discourage the practice of trying to 
sell or barter one's software, or of holding on to it in hopes that someone 
will eventually want to pay for it. The physicist should be rewarded by her 
employer and by the physics community for the software, and should not have 
to start looking for clients. 

• Reward researchers who quickly check reports of errors in their computational 
results and quickly post information about confirmed errors on the Internet. 

9.6.5 Journals 

• In so far as possible, require the professionalism described in section 19.6.11 
Require that peer review no longer rely solely on global qualitative consider- 
ations. Require that each computing result be accompanied within the same 
paper by a clear and cogent discussion of pros and cons about that result's re- 
liability and reproducibility. Require that the author provide the referees with 
all the files necessary to reproduce the results, and require that the referees 
check that the author has in fact taken sufficient steps to be reasonably confi- 
dent that the result is reliable and could be reproduced if desired. Encourage 
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referees to comment on every aspect of the quality of the computing. En- 
courage the referees to (when practical within a limited time) learn about the 
computations by running parts of them and experimenting with parameters. 

• Include computational standards in the written description of the journal and 
its mission [158]. The most stringent example so far may be the ASME 
Journal of Fluids Engineering [126]. Its Policy Statement on the Control of 
Numerical Accuracy [185] states: "The Journal of Fluids Engineering will not 
consider any paper reporting the numerical solution of a fluids engineering 
problem that fails to address the task of systematic truncation error testing 
and accuracy estimation. Authors should address the following criteria for 
assessing numerical uncertainty." Ten specific points follow. Also see the 
Computer Physics Computations journal, which was discussed in section l?.3.5l 

• Consider requiring the same quality standards from computational results that 
is required from experiments [158]. 

• Require that authors share the files which they used to obtain their results 
freely on at least a per request basis, and quickly document any errors on the 
Internet. If one receives a reliable report that an author has failed to do so, 
blacklist her. 

• Restructure the publication process to allow immediate postings of errata, 
and multiple errata pertaining to the same article. 

• Create a web site for files and code which serves as twin to the journal. Allow 
authors to twin their articles with the files and code necessary to reproduce 
their results. These files must not be put under the publisher's copyright, but 
instead made freely available for reuse under a copyleft like the GNU Public 
License [7]. After initial peer review and publication, allow the author to post 
fixes to the files, but always retain the original version and make it clear that 
only the original version was reviewed. There is some potential for abuse here, 
but it is likely small if the author wishes to remain in good standing with the 
scientific community. 

9.6.6 For Funding Agencies and Professional Organiza- 
tions 

• In forming nationwide research agendas, these agencies should clearly ac- 
knowledge the risk that major individual research results could be totally 
invalidated by computational errors, and also that strings of errors in less im- 
portant results could also lead to substantial errors on the part of the physics 
community. These risks should be analyzed in detail, and detailed plans should 
be formed to actively control and minimize them. 

• In section 19.2.41 I argued that software is better understood as a form of 
discourse like text or equations than as a machine or mechanism. If so, 
software should become part of the subject matter of our scholarly discourse; 
in our publications we should as ready to discuss the details of our software 
as we are to discuss details of a theoretical technique or an experiment. 
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• A researcher who acts quickly to confirm reports of her published errors and 
keeps comprehensive information about them on the Internet should be re- 
spected and rewarded. A researcher who is not willing to publish written 
justifications for her computational results and share the files used to obtain 
them should be penalized. 

• Require that funding proposals which include plans to use computers analyze 
the specific problems that could occur, describe the process and resources 

which will be used for designing, implementing, and maintaining the compu- 
tation, and give a detailed plan for how risks will be minimized. Use these as 
a baseline requirement; i.e. projects will not be considered to have scientific 
merit unless they give convincing evidence that they have thought through 
the computation and will take the steps which are necessary for reliability 
and reproducibility. In particular, no high profile research project should be 
funded unless it clearly has adopted a very stringent software development 
cycle and very stringent verification and validation procedures, comparable to 
the most exacting software development projects, and has included the expert 
staff needed to follow through. Project success and failure should also include 
evaluations of how the computing turned out. 

• Funding agencies should hire software experts to participate in evaluations of 
computational aspects of funding proposals and project results. These same 
experts should contribute to any formal guidelines within the agencies. 

9.7 Personal Experience 

I wrote the first draft of this article a year and a half ago, and then decided to 
implement my own recommendations as a test of their validity. Therefore I began 
keeping all my hand written calculations and observations in notebooks, with each 
page signed and dated. This has turned out to be very useful because now I have 
good records of what I did, and when I want to publish a calculation I just need 
to go to my notebooks. I put both the source code for the software I was writing 
for my research and the texts of the articles I was writing in source code control, 
and ended my day by recording the day's changes in source code control. And I 
did backups of all my files every once in a while, perhaps averaging once a month, 
perhaps less. All of these steps were easy to implement; they only required a little 
conscientiousness. 

Other steps were a bit more difficult. Most of my research over the last year and 
a half has been devoted to developing software to check whether certain algorithms 
could be applied to disordered systems, and therefore I had to manage the challenges 
which face scientists who write software. I decided that I would not publish a 
computed result unless there were automated test suites checking each step in 
producing that result, excepting the steps of printing out and graphing the final 
numbers. I also decided that my software should automatically run consistency 
checks on the results it produces. Roughly half of my code ended up being testing 
code; I can estimate that also half of my programming time was spent writing tests. 
I consider this cost well worth it because non-automated tests of the same quality 
are likely impossible and would anyway have consumed far more time because I 
would have had to repeat them each time I changed my code or configuration. My 
final published results were the result of months of computer time, and during those 
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months I found it necessary to make numerous changes to the code. If I had not 
been able to test my code thoroughly after each change, I would have no reason to 
be confident in the validity of my final results. 

There was also the challenge of keeping track of the configuration parameters 
and input data of each individual step in the calculations. I required that every 
step in the calculations be controlled by a unique file (not human input, and not 
a shared file) which contained a complete listing of the configuration parameters 
and the names of the input files. I adopted a standard format for these control 
files, which was a good thing because in the end there were almost twenty thousand 
of them, associated with the various steps in the several months of calculations. 
Perhaps a physicist who is not involved in such intensive computing might not 
end up managing twenty thousand files, but probably she will redo even a simple 
calculation many times with different parameters to see how things change, or 
do many individual steps (for instance, when using Mathematica to manipulate 
formulas.) Keeping track of such calculations is about as challenging as what I was 
doing: the first time is the hardest time. 

There are some decisions about configuration and input management that I 
would make differently now. I would store my configuration parameters in a human 
readable format like the XML (Extensible Markup Language) industry standard 
instead of a binary format. I made a poor decision to store configuration data in the 
same file with the results produced by those configurations, which not only caused a 
persistent confusion of semantics but also made certain tasks (like throwing away the 
results and recomputing, or recording a copy of the configuration files) much more 
difficult than necessary. And eventually I implemented much of the functionality 
of a database; i.e. a standard type of software whose sole design goal is to assist 
with the management of large quantities of data. Even though many databases 
might have had some difficulty accomodating my need to develop the software and 
its configurations incrementally, the most popular databases are pretty well tested 
and certainly provide better solutions to many of my problems than I was able to 
implement myself. Therefore I would consider the possibility of using a database, 
particularly if I anticipated managing hundreds of files or more. 

My investment in reproducibility did serve me well: 

• After an initial study of the matrix step function (density matrix), I decided 
to study other matrix functions as well. Because I had full records of how I 
produced the step function results, I was able to quickly configure my software 
to study other matrix functions, and then to make detailed comparisons of 
the matrix functions, knowing that I was comparing apples to apples. 

• On a few occasions, I found reason to doubt the validity of certain results. 

For instance, in a particular data set of 33 matrices, twelve or so showed 
much different results from the rest. I reran the calculations with the same 
configuration file as before, and the discrepancy disappeared. This sort of 
unreproducible, untraceable failure is well known in computing, and has been 
termed a " Heisenbug." The only solution is to use all of the following: running 
automatic consistency checks on computational results, checking the results 
manually, keeping full records of how the results were produced, and, when 
a calculation turns out suspiciously, rerunning, examining, and retesting its 
each individual step. 

• I have been able distribute on the Internet a full copy of all files necessary 
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to reproduce the graphs, numerical results, and typeset copy of my research 
articles. This may help others understand what I have done, or adapt my 
work to obtain new results, or even check the validity of my results. I too 
may profit, whether by being able to take up a research project where I left 
it off some years previously, or by an increased standing with any researchers 
who use my code. 

I began development of this code in a Linux emulator named Cygwin which 
runs on top of Windows. My first publication was accompanied by files which 
worked in that environment, with the hope that they would work in any other Linux 
variant after only minor changes to the compilation process. Later I converted 
to Linux and found that simply changing the compilation process was insufficient: 
subtle differences between the two environments required that certain scripts be 
altered and file names be changed. This confirms the universal experience that it 
is impossible to ensure that software will run correctly on a new platform without 
actually installing and testing it [163]. I conclude that it is not practical for a 
physicist to distribute her configuration files and code with a guarantee that they 
will work for colleagues with only the most minor changes. I believe that it is still 
important to distribute these files, but for other reasons: because they are an integral 
part of one's research publications without which the publications lose their status 
as scientific works, and because they can be a considerable aid to other physicists 
who are interested either in reproducing one's own results or else in producing new 
results of their own. 

I originally planned to write detailed specifications of the scientific software and 
test suites, and to keep these specifications updated to match design changes during 
implementation. I also planned to write all of the code very clearly and to add lots of 
comments. While I believe that most of the current code is concise and clear, there 
are exceptions, particularly the code which does the final data output and the test 
suites. Moreover there are few comments, especially considering that in professional 
codes every function should be preceded by an explanation of its purpose, inputs, 
and outputs, and should also contain comments around each logical step of its 
execution. I also abandoned the idea of writing specifications at a very early stage, 
when it became clear that I did not have a precise understanding of how to achieve 
my design requirements and would have to proceed by incrementally rearchitecting 
portions of the code. This is actually a common problem in all software development 
projects, and in general does not excuse skipping the specification process. However 
I felt that in this one-person development project I could rely on detailed test 
suites and on rereading my own code to be partial substitutes for specifications and 
comments. For the most part, this approach seemed to work, but it did exacerbate 
some subtle semantic problems. I conclude from this experience that single-person 
software development does not necessarily need to be accompanied by specifications 
and thorough commenting. However I believe that once two or three persons are 
collaborating, these practices become much more important and can not be avoided. 

I have discussed several disciplines of varying difficulties which I adopted in 
order to achieve reproducibility and reliablity. Yet there was another discipline that 
was far more difficult and painful than the others even though it saved both time 
and resources. This was the triage process. I had to continually prioritize the 
things I wanted to achieve, choose to implement new features only if I had already 
sufficiently tested the existing features and expected to be able to test the proposed 
new features as well, and work on tasks in a prioritized order instead of according 
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to my whims. When things went slower than I would have wished (always,) and 
I did not manage to thoroughly test some features and therefore did not have 
a good reason to trust the corresponding results, I had to be willing to omit the 
unchecked results from my publications. Indeed there was a constant and sometimes 
almost overpowering temptation to publish the results anyway on the grounds that 
they were "good enough," despite years of experience in the computing industry 
and plentiful experience of the negative consequences of such wishful thinking and 
acting. I believe that the only thing that saved me was reminding myself that both 
my personal honesty and my standing within in my professional community were at 
stake. Even when I was able to confidently publish a result, I had to acknowledge the 
uncomfortable fact that the calculation could still be wrong despite my best efforts, 
and I had to give a somewhat detailed discussion of the risks as I understood them. 
Therefore I have a lot of sympathy for the physicist who, less informed about the 
risks of computing, makes poor choices for managing those risks. 
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Unquenched QCD and 
Disordered Systems 

I here briefly review the current lattice gauge theory algorithms for simulating 
fermions and the relations between lattice gauge theory and mesoscopic physics. 

10.1 Fermionic Algorithms 

I assume that the reader is familiar with quenched (no dynamical fermions) lattice 
QCD [186-189], and is also aware that unquenched calculations include fermions 
by including the determinant of the Dirac operator in the QCD partition function. 
Either this determinant or its derivative must be evaluated quite often, and each 
evaluation requires a huge amount of computer time. But even worse, current 
algorithms show a very rapid growth in computational time as the fermion mass is 
decreased, and therefore the fermion mass is kept at artificially high values. (See 
these recent reports from major lattice QCD collaborations [190] for the currently 
observed behavior.) In current quenched and unquenched calculations this forces 
people to use masses of the up and down quarks which are four or five times their 
physical values [191,192]. Faster algorithms would be very useful. 

The most commonly used lattice QCD algorithm, Hybrid Monte Carlo, evaluates 
the fermionic determinant stochastically by introducing new degrees of freedom 
which evolve according to forces reflecting the gauge fields [112]. These forces 
are calculated by solving a linear system Ax — b, and in fact the bulk of the 
computational time in modern calculations is spent in repeated solution of this 
linear system. 

Hybrid Monte Carlo's biggest competitor is probably the Local Boson Approxi- 
mation, which follows a similar strategy, using new degrees of freedom to stochas- 
tically evaluate the fermionic determinant's contribution to the Monte Carlo ac- 
ceptance probability [113,114]. Again computation of the forces requires solving 
a linear system. However, the Local Boson Approximation differs by choosing de- 
grees of freedom which each correspond to small segments of the Dirac operator's 
spectrum. This opens a path for computing large eigenvalues differently than small 
eigenvalues. Various efforts to compare Hybrid Monte Carlo with the Local Boson 
Approximation suggest that they have similar performance [113, 193]. 
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Recently various researchers have been more aware of the importance of topolog- 
ical features of the gauge field in the fermion determinant [194]. The Atiyah-Singer 
index theorem states that fermion zero modes will be created when the topological 
index of the gauge field changes [195]. The resulting zero modes can create serious 
difficulties for stochastic unquenched algorithms. But there is also a great deal 
of speculation that the behavior of local topological features ("vortex condensa- 
tion", instantons, etc.) are responsible for confinement [196] and chiral symmetry 
breaking [194,197-201]. 

Motivated by the computational difficulties caused by zero modes, recently Dun- 
can et al. have proposed a new strategy for unquenched calculations [202]. Rather 
than evaluate the entire fermionic determinant stochastically, they evaluate the small 
eigenvalues by solving the relevant eigenvalue problem, and then use a stochastic 
algorithm (a plaquette expansion, or alternatively the LBA) to evaluate the rest of 
the spectrum. This is the first algorithm that I'm aware of that explicitly exploits 
the fact that low energy modes may have much different physics than high energy 
modes. As such, it seems to be a step in the right direction. 

Another recent development has been the introduction of Neuberger's overlap 
operator [203-207], which is special formulation of the Dirac operator. When this 
is used instead of the conventional Dirac operator, chirality is preserved, bringing 
us much closer to the real world which is believed to be nearly chiral. However, the 
overlap operator is much more expensive to compute [205]. 

10.2 Connections With Disordered Systems 

QCD is a disordered system. In the strong coupling limit, the lattice gauge links 
have a correlation length equal to zero; they are independent of each other [187]. 
In modern lattice calculations, the correlation length is a finite number of lattice 
spacings. It is expected that the physically interesting scenario is a weak coupling 
limit where the correlation length is physically finite and thus infinite compared to 
the lattice spacing. In this case, the disorder at any site is small, but the system as 
a whole is broken up into uncorrelated domains. 

Mesoscopic theory has traditionally concentrated on models where the disordered 
degrees of freedom are a scalar potential, not a gauge potential like that found in 
QCD. Note that scalar potential models may already give a qualitative understanding 
of certain QCD phenomena. However, some work has also been done on disordered 
gauge potentials. Most notably, a simplified gauge model, the random flux model, 
has been under intense scrutiny recently because of a conjecture that it is related to 
a good model of the integer quantum Hall effect [208,209]. Also, recently Zirnbauer 
extended the supersymmetric sigma model technique to create a sigma model for 
certain disordered gauge potentials [210-213]. 

Mesoscopic theory has traditionally concentrated on disorder with a correlation 
length equal to zero, because it is easier to handle mathematically than disorder with 
a finite correlation length. The step of making a connection to the finite correlation 
lengths in lattice QCD is not free from difficulty. One can, however, sweep such 
difficulties under the rug by assuming scaling in disordered systems, which means 
that the microscopic details of disorder are unimportant at longer length scales, 
except to set the initial conditions for renormalization group equations [214]. This 
argument indicates that finite-correlation-length systems are very much like zero- 
correlation-length systems, and would allow a direct link with QCD. However, the 
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last decade has seen continued debate about the applicability of the renormalization 
group to disordered systems [56,70,214,215]. 

Notwithstanding these difficulties, the theory of disordered systems still has a 
lot to say to certain aspects of lattice QCD. A number of researchers have examined 
the spectra of Dirac operators in gauge field configurations determined by quenched 
QCD simulations, and have found that the low eigenvalues (below the Thouless 
energy) obey the level spacing statistics of random matrix theory [197-201]. Sigma 
model calculations predict the same result analytically. 

Disorder induced localization is also observed in lattice QCD. Localization can 
be expected in random gauge models as long as no symmetry prohibits it. The 
random flux model, for instance, exhibits localization throughout its energy band, 
with the possible (and debated) exception of its center where there is a discrete 
symmetry [216]. Localization is seen in quenched simulations of QCD [217-219]. 
However, unquenching QCD may unlocalize the fermionic modes with energies close 
to zero that form the chiral condensate [199]. Indeed, Duncan et al. reported the 
absence of zero modes in unquenched calculations [202]. 

In the last year and a half the lattice QCD community has made its first attempts 
at understanding unquenched QCD's disorder-related physics. These efforts have 
been motivated by two considerations: 

1. One of the most popular discretizations of QCD is the staggered discretization. 
The staggered approach unfortunately multiplies the number of fermions by 
four, and therefore one needs to somehow get rid of two of the four, leaving 
just the up and down quarks. The traditional prescription for doing this has 
been to weight the QCD partition function with the square root of the fermion 
determinant instead of with the whole fermion determinant. 

This prescription is mathematically equivalent to using the square root of 
the Dirac operator in the original action of lattice QCD. Recently the lattice 
community has become painfully aware that an action with a square root in 
it may not correspond to a local theory. In a plenary talk at the most recent 
lattice QCD conference, Kennedy [190] described the consequences: 

"If a QFT is local then we are guaranteed that it has the cluster decom- 
position property, and that within the context of renormalized perturbation 
theory it satisfies the familiar power counting rules, exhibits universality, and 
is amenable to a systematic improvement procedure. On the other hand, if it 
is not local then there is little we can say about these properties other than 
that we we have no a priori reason to expect them to hold. In particular, if 
we have a non-local lattice theory then we have no good reason to invoke 
power-counting arguments to justify taking the naive continuum limit, or to 
expect the lattice theory to be described by continuum perturbation theory 
however small the lattice spacing. The fact that a formulation is not mani- 
festly local does not logically imply that it is not local.... However, in general 
a non-manifestly local theory has no reason to be equivalent to a local one. 
Even if there was a local action corresponding to taking fractional powers of 
the fermion determinant in the functional integral, we are still required to use 
this local action to measure fermionic quantities.... We should not expect that 
measuring operators corresponding to a local four taste valence action M on 
configurations generated with VdetM to lead to consistent results. Not only 
might there be unknown renormalisations of the parameters between the sea 
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and valence actions (e.g., what is the justification for using the same numeri- 
cal value for the quark masses?) but the degrees of freedom are not even the 
same. " 

In summary, the square root trick is an uncontrolled approximation, and can 
completely change the results, unless one can prove that a particular square 
root of the Dirac operator is local in some sense and can be used to construct 
a local field theory. This question may be hard to decide because there many 
ways of taking roots of matrices, which are analogous to the many possible 
choices of branch cuts in scalar roots. A few researchers have studied the 
locality of roots of the Dirac operator, and the only major result so far is that 
the most straightforward choice of root is non-local [220,221]. 

2. Recently there has been some doubt about whether Neuberger's overlap op- 
erator can be relied on to always deliver chiral symmetry, which is of course 
the reason why one would use the operator. The mechanism of failure would 
depend on the localization of the Dirac operator's eigenfunctions. A recent 
paper by Golterman, Shamir, and Svetitsky [222] explains this a bit more 
deeply, discussing first domain wall fermions (which approximate the overlap 
operator) and then the overlap operator: 

"Domain-wall fermions employ an auxiliary, discrete, and (in practice) finite 
fifth dimension with spacing 05 and sites. Finiteness of the fifth dimension 
ensures locality but leads to "residual" violations of chiral symmetry...." If the 
wavefunctions of the Dirac operator stretch across the whole fifth dimension 
and feel its finite size, then the domain wall approach will not deliver chiral 
symmetry. Therefore one needs to know that these eigenfunctions are local- 
ized in the fifth dimension; that they die off exponentially. One concludes 
that understanding the localization physics of QCD is crucial to validating 
the domain wall approach. 

In the limit where the fifth dimension becomes exponentially long, domain- 
wall fermions are equivalent to the overlap operator; one might expect that 
the overlap operator's validity will also be determined by localization physics. 
The authors state: "For overlap fermions, chiral symmetry is guaranteed, but 

not locality [of the overlap operator] Deteriorating locality, caused by the 

proximity of the Aoki phase, will distort physical predictions in an uncontrolled 
way." (The Aoki phase corresponds to sponteously breaking the symmetry 
between the three pions, so that two of them become massless and therefore 
are even longer ranged than they should be. It does not correspond to the 
physical continuum limit, but can nonetheless arise in lattice simulations.) 

And now the obvious conclusion: "This brings us to propose that the range 
of the overlap operator, as well as the key spectral quantities of Hw that 
control it — Ac, /9(0), and ^;(0) — should be calculated in any overlap-fermion 
simulation, much as rures is routinely determined in domain-wall fermion 
simulations." In other words, Golterman et al. are suggesting a requirement 
that the localization physics be understood in all overlap calculations. 
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